WO2022252596A1 - 构建ai集成模型的方法、ai集成模型的推理方法及装置 - Google Patents

构建ai集成模型的方法、ai集成模型的推理方法及装置 Download PDF

Info

Publication number
WO2022252596A1
WO2022252596A1 PCT/CN2021/142269 CN2021142269W WO2022252596A1 WO 2022252596 A1 WO2022252596 A1 WO 2022252596A1 CN 2021142269 W CN2021142269 W CN 2021142269W WO 2022252596 A1 WO2022252596 A1 WO 2022252596A1
Authority
WO
WIPO (PCT)
Prior art keywords
model
training data
graph
base
network model
Prior art date
Application number
PCT/CN2021/142269
Other languages
English (en)
French (fr)
Inventor
田奇
常建龙
张恒亨
姜娜娜
魏龙辉
张晓鹏
谢凌曦
Original Assignee
华为云计算技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority claimed from CN202110977566.XA external-priority patent/CN115964632A/zh
Application filed by 华为云计算技术有限公司 filed Critical 华为云计算技术有限公司
Priority to EP21943948.6A priority Critical patent/EP4339832A1/en
Publication of WO2022252596A1 publication Critical patent/WO2022252596A1/zh
Priority to US18/524,875 priority patent/US20240119266A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/0464Convolutional networks [CNN, ConvNet]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/764Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/29Graphical models, e.g. Bayesian networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/80Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level
    • G06V10/806Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level of extracted features

Definitions

  • This application relates to the technical field of artificial intelligence (AI), in particular to a method for constructing an AI integrated model, a reasoning method for an AI integrated model, a management system for an AI integrated model, a reasoning device, a computing device cluster, and a computer-readable storage Media, Computer Program Products.
  • AI artificial intelligence
  • AI models are also increasing. For example, the structure of many AI models gradually becomes deeper and wider, and the number of parameters of AI models gradually increases. At present, some AI models can mine massive data based on their own huge scale and a large number of computing resources to complete corresponding AI tasks.
  • AI models obtained through integration may be called an AI integrated model, and multiple AI models used to form the AI integrated model may be called a base model.
  • the outputs of multiple base models in the AI integrated model can be fused to obtain fused reasoning results.
  • the fusion methods of AI integrated models can be different. For example, for classification tasks, it is usually possible to vote on the output of multiple base models to obtain the inference results of the AI integrated model; for another example, for regression tasks, it is usually possible to average the outputs of multiple base models and average As an inference result of an AI ensemble model.
  • the above-mentioned method of using the AI integrated model to obtain the final inference result does not consider the differences and correlations of the base models in the AI integrated model, and directly averages the output of the base models, or performs voting processing to determine the base model
  • the AI integrated model cannot reflect the mutual cooperation capabilities of its internal base models, so the accuracy of the execution results of AI tasks obtained based on the AI integrated model needs to be improved.
  • This application provides a method for building an AI integrated model.
  • This method builds the graph network model and multiple base models into an AI integrated model.
  • the graph network model in the AI integrated model fuses the outputs of multiple base models, it fully considers the differences and differences between the base models. Correlation, whereby the features obtained according to the graph network model are used for the processing of AI tasks, and the accuracy of the obtained execution results of AI tasks is improved.
  • the present application provides a method for constructing an AI integrated model.
  • the method can be executed by the management platform of the AI integrated model.
  • the management platform may be a software system for building an AI integrated model, and the computing device or computing device cluster executes the method for building the AI integrated model by running the program code of the software system.
  • the management platform can also be a hardware system for building AI integrated models. The following uses the management platform as an example to describe the software system.
  • the management platform can obtain the training data set, the initial graph network model, and multiple base models, and then use the training data in the training data set and multiple base models to iteratively train the initial graph network model to obtain the graph network model.
  • the network model and multiple base models are constructed as an AI integrated model, where the input of the graph network model is a graph structure composed of the outputs of multiple base models.
  • the management platform constructs a graph structure according to the output of multiple base models, and then processes the graph structure through a graph network model to fuse the outputs of the multiple base models. Since the graph network model considers the neighbor nodes of each node in the graph structure when processing the graph structure, when the graph network model fuses the outputs of multiple base models, it fully considers the differences between the base models and correlation, so that the features obtained according to the graph network model are used for the processing of subsequent AI tasks. Compared with the features obtained by any base model, the processing of subsequent AI tasks can obtain more accurate execution results of AI tasks. That is to say, the technical solution of the present application improves the accuracy of obtaining the execution result of the AI task.
  • the management platform integrates the outputs of multiple base models through the graph network model, and can use the end-to-end parallel training method to train the AI integrated model. On the one hand, it reduces the difficulty of model training and improves the efficiency of model training. On the one hand, it guarantees the generalization performance of the trained AI integrated model.
  • each iteration when the management platform uses the training data in the training data set and the multiple base models to iteratively train the initial graph network model, each iteration includes Input the first training data of the first training data to each base model respectively, obtain the output after each base model performs inference on the first training data, and then use the multiple base models to perform inference on the first training data.
  • the output is constructed into a graph structure, which is then used to train the initial graph network model.
  • using the graph structure to train the initial graph network model can enable the trained graph network model to fully consider the differences and correlations between the base models when fusing the outputs of multiple base models.
  • the features obtained by the network model are used for the processing of AI tasks, which improves the accuracy of the execution results of AI tasks.
  • the multiple base models include one or more of the following types of AI models: a decision tree model, a random forest model, and a neural network model.
  • the decision tree model, random forest model, etc. can be used to process structured data
  • the neural network model can be used to process unstructured data such as images, text, voice, video and other types of data.
  • Different base models can be used to construct different AI integration models, such as the AI integration model for processing structured data and the AI integration model for processing unstructured data, which meet the needs of different businesses.
  • the management platform can train a super network, and obtain multiple base models from the super network.
  • the base model obtained by the management platform from the super network is a neural network model.
  • the neural network model is generated by the management platform based on the user's choice through neural network search.
  • the base model obtained by training the super network in real time has a higher matching degree with the AI task, which can improve the AI task obtained based on the AI integrated model.
  • the precision of the execution result is Compared with the base model obtained from the built-in model of the management platform or the model uploaded by the user in advance, the base model obtained by training the super network in real time has a higher matching degree with the AI task, which can improve the AI task obtained based on the AI integrated model. The precision of the execution result.
  • the management platform can combine the base models to construct an AI integrated model of a specified size, so as to meet the individual needs of users.
  • the management platform also supports the addition or deletion of the base model, which reduces the cost of iterative update of the AI integrated model.
  • both the base model and the AI integrated model can be used to extract features. Therefore, the management platform can first obtain inference results based on the base model without waiting for the AI integrated model to be built, thereby shortening inference time and improving inference efficiency. And the utilization of intermediate results (such as the inference results of the base model) is improved.
  • the management platform when the management platform trains the supernetwork and obtains multiple base models from the supernetwork, it can use the training data in the training data set to train the supernetwork to obtain the i-th base model, where i is a positive integer, then the management platform can update the weight of the training data in the training data set according to the performance of the i-th base model, use the training data in the training data set after the weight update to train the supernetwork, and obtain the i+1th base model Model.
  • the weight of the training data may represent the probability that the training data is used to train the super network.
  • the management platform updates the weight of the training data, which can update the probability that the training data in the training data set is used to train the super network, so that targeted training can be carried out based on some training data, and a new base model can be obtained. Complementary performance with the original base model can further improve the accuracy of the execution result of the AI task obtained by the AI integrated model constructed based on multiple base models.
  • the management platform when the performance of the i-th base model on the training data of the second category is higher than that on the training data of the first category, the management platform can increase the training data of the first category in the training data set and/or reduce the weight of the training data of the second category in the training data set. In this way, the management platform can focus on training the supernetwork based on misclassified training data to obtain a new base model.
  • the multiple base models obtained in this way can complement each other, improving the accuracy of the execution result of the AI task obtained based on the AI integrated model.
  • the management platform when the management platform uses the training data after updating the weights to train the super network, it may use the training data after updating the weights to fine-tune the super network. Since the management platform can continue to train the trained super network without starting training from scratch, the training efficiency is improved and the training progress is accelerated.
  • the management platform may determine the similarity between the outputs of every two base models in the multiple base models after reasoning the first training data, and then use the multiple base models
  • the output of each base model inference after the first training data is a graph structure node, the edge between the nodes is determined according to the similarity, and the graph structure is obtained according to the node and the edge .
  • the graph structure constructed by the above method can retain information such as the similarity between the outputs of different base models through the edges between nodes. Therefore, the AI integration model can process the graph structure through the graph network model, so as to realize the Information such as the similarity between the outputs of different base models is fused, and the fused features are used for the processing of AI tasks, which can improve the accuracy of the execution results of AI tasks.
  • the graph network model includes any one of a graph convolutional network model, a graph attention network model, a graph autoencoder model, a graph generation network model, or a graph spatio-temporal network model.
  • graph network models such as graph convolutional network models have strong expressive capabilities, especially for non-Euclidean data (non-Euclidean structured data), and can effectively aggregate the features output by different base models. According to The features obtained by the above-mentioned graph network model are used for the processing of AI tasks, which improves the accuracy of the execution results of AI tasks.
  • the graph network model is a graph convolutional network model obtained by simplifying the Chebyshev network.
  • the Chebyshev network approximates the convolution kernel by using a high-order approximation of the Laplacian matrix (such as polynomial expansion), which greatly reduces the amount of parameters and makes the graph convolution network model local.
  • the present application provides a reasoning method of an AI integrated model.
  • the method can be executed by a reasoning device, and the AI integrated model includes a graph network model and multiple base models.
  • the reasoning device may obtain input data, and then input the input data into each base model in the AI integrated model, and obtain an output after each base model performs reasoning on the input data.
  • each base model is a trained AI model.
  • the inference device can construct the output of multiple base models into a graph structure, and then input the graph structure into the graph network model, and obtain the inference result of the AI integrated model based on the graph network model.
  • the reasoning device can construct the output of multiple base models into a graph structure, and process the graph structure through the graph network model in the AI integrated model, so that the difference and correlation between each base model can be realized.
  • the output of multiple base models is fused, which improves the accuracy of the execution results of AI tasks obtained based on the AI integrated model.
  • the reasoning device may determine the similarity between the outputs of every two base models in the multiple base models, and then use the output of each base model in the multiple base models as a graph structure nodes, determine the edges between the nodes according to the similarity, and obtain the graph structure according to the nodes and the edges.
  • the reasoning device can store information such as similarity and difference between the outputs of multiple base models based on the edge information in the graph structure, and based on this information, the outputs of multiple base models can be fused to improve the AI-based integrated model. Accuracy of the execution result of the obtained AI task.
  • the inference result of the AI integrated model is a feature of the input data.
  • the feature of the input data may be a fused feature obtained by fusing features extracted from multiple base models by the graph network model in the AI integrated model.
  • the reasoning device may input the reasoning result of the AI integrated model to the decision-making layer, and use the output of the decision-making layer as the execution result of the AI task.
  • the decision-making layer can be a classifier or a regressor, etc.
  • the features extracted by the reasoning device through the AI integrated model are features fused based on the similarities and differences of multiple base models, making further decisions based on the features to obtain the execution results of AI tasks can improve the accuracy of the execution results of AI tasks.
  • the reasoning device may input the reasoning result of the AI integrated model into the task model, use the task model to perform further feature extraction on the reasoning result, and make decisions based on the features after further feature extraction,
  • the result obtained by the decision is taken as the execution result of the AI task, wherein the task model is an AI model trained for the AI task.
  • the reasoning device uses the AI integrated model to preprocess the input data, so that the downstream task model can perform feature extraction and decision-making based on the preprocessed data, so as to complete the corresponding AI task.
  • the task model performs feature extraction and decision-making on the preprocessed data instead of directly performing feature extraction and decision-making on the original input data, so it has a high response speed and response efficiency.
  • the present application provides a management system for an AI integrated model.
  • the system includes:
  • An interaction unit configured to obtain a training data set, an initial graph network model and a plurality of base models, wherein each base model is a trained AI model;
  • a training unit configured to use the training data in the training data set and the plurality of base models to iteratively train the initial graph network model to obtain a graph network model;
  • a construction unit configured to construct the graph network model and the multiple base models into the AI integrated model, wherein the input of the graph network model is a graph structure composed of outputs of the multiple base models.
  • each iteration includes:
  • the initial graph network model is trained using the graph structure.
  • the multiple base models include one or more of the following types of AI models: a decision tree model, a random forest model, and a neural network model.
  • the interaction unit is specifically configured to:
  • the super network is trained by the training unit, and a plurality of basic models are obtained from the super network.
  • the training unit is specifically used for:
  • the hypernetwork is trained by using the training data in the training data set after the weights are updated to obtain the i+1th base model.
  • the training unit is specifically used for:
  • the training unit is specifically used for:
  • the training unit is specifically used for:
  • the graph network model includes any one of a graph convolutional network model, a graph attention network model, a graph autoencoder model, a graph generation network model, or a graph spatio-temporal network model.
  • the graph convolutional network model includes a graph convolutional network model obtained by simplifying a Chebyshev network.
  • the present application provides an inference device for an AI integrated model.
  • the AI integrated model includes a graph network model and multiple base models, and the device includes:
  • a communication module for obtaining input data
  • the first reasoning module is configured to input the input data into each base model in the AI integrated model, and obtain an output of each base model after reasoning the input data, wherein each base model is the trained AI model;
  • a building block for building the outputs of the plurality of base models into a graph structure
  • a second reasoning module configured to input the graph structure into the graph network model, and obtain a reasoning result of the AI integrated model based on the graph network model.
  • the building blocks are specifically used for:
  • each base model in the plurality of base models is a node of the graph structure, the edges between the nodes are determined according to the similarity, and the graph structure is obtained according to the nodes and the edges.
  • the inference result of the AI integrated model is a feature of the input data.
  • the device further includes:
  • the execution module is configured to input the reasoning result of the AI integrated model to the decision-making layer, and use the output of the decision-making layer as the execution result of the AI task.
  • the device further includes:
  • the execution module is used to input the reasoning result of the AI integrated model into the task model, use the task model to perform further feature extraction on the reasoning result, and make a decision based on the features after further feature extraction, and obtain the result of the decision
  • the result is an execution result of the AI task, wherein the task model is an AI model trained for the AI task.
  • the present application provides a computing device cluster, where the computing device cluster includes at least one computing device.
  • At least one computing device includes at least one processor and at least one memory.
  • the processor and the memory communicate with each other.
  • the at least one processor is configured to execute instructions stored in the at least one memory, so that the cluster of computing devices executes the method described in any implementation manner of the first aspect or the second aspect.
  • the present application provides a computer-readable storage medium, where instructions are stored in the computer-readable storage medium, and the instructions instruct a computing device or a cluster of computing devices to execute any one of the above-mentioned first aspect or second aspect Implement the method described in the manner.
  • the present application provides a computer program product containing instructions, which, when running on a computing device or a computing device cluster, causes the computing device or computing device cluster to perform any one of the above-mentioned first aspect or second aspect Implement the method described in the manner.
  • Fig. 1 is a system architecture diagram of a management platform of an AI integrated model provided by an embodiment of the present application
  • FIG. 2A is a schematic diagram of deployment of a management platform provided by an embodiment of the present application.
  • FIG. 2B is a schematic diagram of deployment of a management platform provided by an embodiment of the present application.
  • FIG. 3 is a schematic diagram of an interactive interface provided by an embodiment of the present application.
  • FIG. 4 is a flowchart of a method for constructing an AI integrated model provided in an embodiment of the present application
  • FIG. 5 is a schematic diagram of a graph convolutional network model provided by an embodiment of the present application.
  • FIG. 6A is a schematic flow diagram of obtaining a base model provided by an embodiment of the present application.
  • FIG. 6B is a schematic flow diagram of a neural network search provided in the embodiment of the present application.
  • FIG. 7 is a schematic flow diagram of obtaining multiple base models provided by the embodiment of the present application.
  • FIG. 8 is a schematic structural diagram of an inference device provided by an embodiment of the present application.
  • FIG. 9 is a schematic diagram of deployment of an inference device provided by an embodiment of the present application.
  • FIG. 10 is a flow chart of a reasoning method for an AI integrated model provided in an embodiment of the present application.
  • FIG. 11 is a schematic structural diagram of a computing device cluster provided by an embodiment of the present application.
  • FIG. 12 is a schematic structural diagram of a computing device cluster provided by an embodiment of the present application.
  • first and second in the embodiments of the present application are used for description purposes only, and cannot be understood as indicating or implying relative importance or implicitly indicating the quantity of indicated technical features. Thus, a feature defined as “first” and “second” may explicitly or implicitly include one or more of these features.
  • AI models refers to an algorithm model developed and trained by AI technologies such as machine learning to achieve specific AI tasks.
  • AI models can include support vector machine (support vector machine, SVM) models, random forest (random forest, RF) models and decision tree (decision tree, DT) models
  • AI models can also include deep learning (deep learning, DL ) model, such as a neural network model.
  • multiple independent AI models can also be combined to form a large-scale AI model (also called an AI large model).
  • the manner of forming a large-scale AI model by using multiple AI models may include an integrated manner, and a large-scale AI model obtained through the integrated manner is also referred to as an AI integrated model.
  • the AI model used for feature extraction in the AI integrated model is also called a base model, or a base learner.
  • the base model can adopt decision tree model, random forest model or neural network model, etc. It should be understood that the base models included in the AI integrated model in this application operate relatively independently.
  • the reasoning results (that is, outputs) of multiple base models are combined in a certain way, and the combined output is used as the AI
  • the output of the ensemble model, that is, the ensemble in this application is actually an ensemble representing the inference results of the base model.
  • a graph network model is an AI model for processing graph structures, such as a graph neural network model.
  • the graph structure is a data structure including multiple nodes (also called vertex vector). Edges are included between at least two of the plurality of nodes.
  • nodes can be represented by circles, and edges can be represented by lines between circles.
  • the graph structure can be applied in different scenarios to express data with relationships.
  • a graph structure can be used to represent user relationships in a social network. Specifically, nodes in the graph structure represent users, and edges in the graph structure represent relationships between users, such as colleagues, friends, relatives, and so on.
  • a graph structure can be used to represent airlines, specifically, nodes in the graph structure are used to represent cities, and edges in the graph structure are used to represent routes between cities.
  • the decision-making layer an algorithmic structure for making decisions based on input features.
  • the decision-making layer is usually used in conjunction with the AI model or AI integrated model for feature extraction to complete specific AI tasks.
  • the base model or graph network model can extract features, and then the extracted features can be input to the decision-making layer for decision-making.
  • the decision-making layer may include different types, for example, the decision-making layer may be a classifier or a regressor.
  • the AI model or AI integrated model may not include a decision layer, that is, it is only used for feature extraction.
  • the features obtained through the AI model or AI integrated model can be input to the decision-making layer to achieve specific AI tasks.
  • the decision-making layer can also be used as a part of the AI model or AI integrated model, that is, the AI model or AI integrated model is used for both feature extraction and decision-making.
  • the AI model or AI ensemble model can directly obtain the results for the AI task.
  • the base model and graph network model in the follow-up AI integrated model in this application are only used for feature extraction and do not include the functions of the decision-making layer.
  • the features obtained through the AI integrated model can be used according to the goals of the AI task Continue to input to the decision-making level.
  • AI tasks refer to tasks completed by using the functions of AI models or AI integrated models.
  • AI tasks can include image processing (such as image segmentation, image classification, image recognition, image annotation, etc.), natural language processing (language translation, intelligent question answering) or speech processing (voice wake-up, speech recognition, speech synthesis) and other tasks.
  • image processing such as image segmentation, image classification, image recognition, image annotation, etc.
  • natural language processing language translation, intelligent question answering
  • speech processing voice wake-up, speech recognition, speech synthesis
  • Different AI tasks have different levels of difficulty.
  • some AI tasks can be completed by a simple trained AI model and decision layer.
  • the reasoning accuracy of a single AI model is not high, and using multiple AI models as the base model to build an AI integrated model is a strategy to improve accuracy.
  • the outputs of multiple base models can be fused through voting or weighted average to obtain the inference result of the AI integrated model.
  • the reasoning result of the AI integrated model obtained by this method does not consider the differences and correlations of the base model itself, so that the accuracy of the execution result of the AI task obtained based on the AI integrated model is still not high.
  • multiple base models in the AI integrated model are usually trained in parallel, and there is no strong dependency between the base models. It is so difficult to fully exploit the advantages of each base model, which may lead to poor inference effect of the AI integrated model on some input data. This affects the accuracy of the execution result of the AI task obtained based on the AI integrated model.
  • the embodiment of the present application provides a method for constructing an AI integrated model.
  • the method can be executed by the management platform of the AI integrated model.
  • the management platform can obtain the training data set, the initial graph network model and multiple base models, and then use the training data in the training data set and multiple base models to iteratively train the initial graph network model to obtain the graph network model, and then combine the graph network model and Multiple base models are constructed as an AI integrated model, where the input of the graph network model is a graph structure composed of the outputs of multiple base models.
  • the management platform constructs a graph structure according to the output of multiple base models, and then processes the graph structure through a graph network model to fuse the outputs of the multiple base models. Since the graph network model considers the neighbor nodes of each node in the graph structure when processing the graph structure, the graph network model fully considers the differences between the base models when it fuses the outputs of multiple base models and correlation, so that the features obtained according to the graph network model are used for the processing of AI tasks, and the accuracy of the execution results of AI tasks obtained based on the AI integrated model is improved.
  • the management platform when it obtains multiple base models, it can obtain the base models according to the training of the hypernetwork, and update the weights of the training data used to train the hypernetwork based on the performance of the current base model, for example , increasing the weight of the base model misclassifying the training data. Then, the next base model is obtained through the training data after the weights are updated, so that multiple base models can complement each other, thereby improving the accuracy of the execution result of the AI task obtained based on the AI integrated model.
  • the management platform 100 includes an interaction unit 102 , a training unit 104 and a construction unit 106 . Further, the management platform 100 may also include a storage unit 108 . Each unit is introduced separately below.
  • the interaction unit 102 is used to obtain a training data set, an initial graph network model and multiple base models. Wherein, each base model is a trained AI model.
  • the interaction unit 102 can obtain the training data set, the initial graph network model and multiple base models in various ways. For example, the interaction unit 102 can obtain the training data set, initial graph network model, and multiple base models used to build the AI integrated model from the built-in training data set, initial graph network model, and base model of the management platform 100 according to the user's selection. Model. For another example, the interaction unit 102 may also receive a training data set uploaded by a user, an initial graph network model, and multiple base models.
  • the training unit 104 is configured to use the training data in the training data set and multiple base models to iteratively train the initial graph network model to obtain the graph network model.
  • each iteration includes: inputting the first training data in the training data set to each base model respectively, and obtaining the result of each base model performing inference on the first training data. output, and then construct a graph structure from the output of multiple base models after inferring the first training data, and then use the graph structure to train the initial graph network model.
  • the first training data may be several training data in the training data set.
  • the training data in the training data set can be divided into several batches according to the batch size, and the number of training data included in each batch is equal to the batch size.
  • the first training data may be a batch of training data among several batches of training data.
  • the training unit 104 is also used to train a super network, and obtain multiple base models from the super network.
  • the training unit 104 may update the weights of the training data used to train the hypernetwork based on the performance of the current base model, for example, increase the weight of the training data misclassified by the base model. Then the training unit 104 trains the supernetwork by using the training data after updating the weights to obtain the next base model.
  • Such multiple base models can complement each other, improving the accuracy of the execution results of AI tasks obtained based on the AI integrated model.
  • the construction unit 106 is used to construct the graph network model and multiple base models as an AI integrated model.
  • the input of the graph network model is a graph structure composed of the outputs of multiple base models.
  • the construction unit 106 is used to use the graph structure obtained by the output of multiple base models as the input of the graph network model, so that in the inference stage, multiple base models and the graph network model can be used together to process the input data, In this way, the inference results of the AI integrated model are obtained. Since the construction unit 106 connects multiple base models and graph network models based on output and input, in the reasoning stage, the AI integrated model can automatically reason about the input data as a whole.
  • the storage unit 108 is used for storing the training data set, the initial graph network model and/or the base model built in the management platform 100 . Further, the storage unit 108 may also store the training data set, the initial graph network model and/or the base model uploaded by the user. In some embodiments, the storage unit 108 may also store the base model obtained by training the hypernetwork by the training unit 104 . Wherein, the storage unit 108 may also store the training parameters and the like set by the user through the interaction unit 102 . This embodiment does not limit it.
  • FIG. 1 illustrates the architecture of the management platform 100 in detail.
  • the deployment manner of the management platform 100 will be introduced in detail below.
  • the aforementioned AI integrated model management platform 100 may also be referred to as an AI integrated model management system.
  • the AI integrated model management system may be a software system deployed on a hardware device or a hardware device cluster, and the AI integrated model management system may also be a hardware system composed of one or more hardware devices.
  • the description of the management platform 100 is an exemplary description of the management system of the AI integrated model.
  • the management platform 100 can be deployed in a cloud environment.
  • the management platform 100 is a software system, it is specifically one or more computing devices deployed on the cloud environment (for example: the center server), or when the management platform 100 is a hardware system, the management platform 100 may include one or more computing devices on a cloud environment.
  • the cloud environment indicates a central computing device cluster owned by a cloud service provider for providing computing, storage, and communication resources.
  • the user can trigger the operation of starting the management platform 100 through a client (such as a browser or a dedicated client), and then, the user interacts with the management platform 100 through the client to build an AI integrated model.
  • a client such as a browser or a dedicated client
  • the interaction unit 102 of the management platform 100 can provide interaction logic, and the client can present an interaction interface to the user based on the interaction logic.
  • the interactive interface may be, for example, a graphical user interface (graphical user interface, GUI) or a command user interface (command user interface, CUI).
  • the interactive interface 300 supports the user to configure the training data set, base model and initial graph network model.
  • the interactive interface 300 carries a training data set configuration component 302 , a base model configuration component 304 and an initial graph network model configuration component 306 .
  • the training data set configuration component 302 includes a drop-down control, and when the drop-down control is triggered, a drop-down box can be displayed, and the user can select a built-in training data set in the management platform 100 from the drop-down box, for example, training data set 1 to training data set k any of the . Among them, k is a positive integer.
  • the user can also select a custom training data set. Specifically, when the user selects a custom training data set from the drop-down box, the interface 300 can provide an interface for the user to input the address where the custom training data set is located, so that the client can obtain the custom training data set according to the address. training data set.
  • the base model configuration component 304 includes a drop-down control.
  • a drop-down box can be displayed, and the drop-down box can include a built-in base model of the management platform 100, such as a random forest model, a decision tree model, or a neural network. models and more.
  • the random forest model and the decision tree model may be trained AI models.
  • the management platform 100 may have at least one built-in instance of the random forest model, and/or at least one built-in instance of the decision tree model.
  • the drop-down control of the base model configuration component 304 is triggered, at least one instance of various models built in the management platform 100 may be displayed through the drop-down box.
  • the user can also configure the quantity of the instance through the quantity configuration control in the base model configuration component 304 .
  • the user can also configure multiple model instances as the base model through the drop-down control, and configure the number of instances for each model instance.
  • the drop-down control can also support users to upload a custom model as the base model.
  • the drop-down box displayed by the drop-down control includes a user-defined model, and the user can select the user-defined model, thereby triggering a process of uploading the user-defined model as the base model.
  • the user can also pre-upload a custom model. In this way, when configuring the base model, the user can select a base model from the custom models uploaded by the user for building an AI integration model.
  • the above-mentioned base model selected by the user may be built into the management platform, or may be pre-uploaded by the user. In some other embodiments, the base model selected by the user may also be generated by the management platform according to the user's selection. For example, when the user selects a neural network model, the interactive interface 300 may also provide an interface for the user to configure to obtain the neural network model. The relevant parameters of the network model. For example, when the neural network model is obtained through super-network sampling, the interactive interface 300 can provide parameter interfaces such as search space, performance index, and reference value of the performance index, so that the user can configure corresponding parameters through the above-mentioned interface. In this way, the management platform 100 can obtain multiple base models through neural network search based on the above parameters.
  • the initial graph network model configuration component 306 includes a drop-down control.
  • a drop-down box can be displayed.
  • the user can select the initial graph network model built in the management platform 100 or uploaded by the user, for example, graph convolution can be selected.
  • Graph convolution networks (GCN) model graph attention networks (GAN) model, graph autoencoder (graph antoencoders, GAE) model, graph generative networks (GGN) model or graph spatiotemporal network Any one of the (graph spatial-temporal networks, GSTN) models.
  • the interactive interface 300 also carries an OK control 308 and a Cancel control 309 .
  • the cancel control 309 When the cancel control 309 is activated, the user's selection is canceled.
  • the client When it is determined that the control 308 is triggered, the client may submit the above parameters configured by the user to the management platform 100 .
  • the management platform 100 can obtain the training data set, the initial graph network model and multiple base models according to the above configuration, and then iteratively train the initial graph network model based on the training data set and multiple base models to obtain the graph network model, and then convert the graph network
  • the model and multiple base models are built into an AI integrated model.
  • multiple users can trigger and start the operation of the management platform 100 through their respective clients, so as to create instances of the management platform 100 corresponding to the multiple users in the cloud environment.
  • Each user can interact with a corresponding instance of the management platform 100 through a respective client, so as to realize the construction of a respective AI integrated model.
  • multiple users can configure corresponding training data sets, initial graph network models and multiple base models based on their respective AI tasks.
  • the training dataset, initial graph network model, and multiple base models can be different for different user configurations.
  • the AI integrated models constructed by different users may be different. That is to say, the management platform 100 provides a one-stop AI integrated model construction method, which can construct corresponding AI integrated models for different AI tasks of different users or different AI tasks of the same user, which has high versatility and usability, and can Meet business needs.
  • the management platform 100 may also be deployed in an edge environment, specifically deployed on one or more computing devices (edge computing devices) in the edge environment or the management platform 100 includes one or more computing devices in the edge environment, the edge computing device It can be a server, computing box, etc.
  • the edge environment indicates an edge computing device cluster that is geographically closer to the terminal device (that is, the end-side device) and used to provide computing, storage, and communication resources.
  • Terminal devices include, but are not limited to, user terminals such as desktops, laptops, and smart phones.
  • the management platform 100 may be deployed in different environments in a distributed manner.
  • the interaction unit 102 may be deployed in an edge environment
  • the training unit 104 and the construction unit 106 may be deployed in a cloud environment.
  • the user can trigger the operation of starting the management platform 100 through the client to create an instance of the management platform 100 .
  • each instance of the management platform 100 includes an interaction unit 102 , a training unit 104 and a construction unit 106 .
  • the above units are distributed and deployed in cloud environment and edge environment.
  • FIG. 2B is only an implementation manner in which various parts of the management platform 100 are distributed and deployed in different environments.
  • a part of the management platform 100 may also be respectively deployed in the cloud environment, the edge environment, and the three environments of the end device, or the other two environments.
  • the method includes:
  • the management platform 100 acquires a training data set.
  • the management platform 100 may have at least one built-in training data set.
  • the built-in training data set can be an open source data set obtained from the open source community, such as ImageNet, OpenImage, etc.
  • the built-in training data set may also include a data set customized by the operator of the management platform 100, a private data set rented or purchased by the operator of the management platform 100, and the like.
  • the user can select a training data set from at least one training data set built in the management platform 100, so that the management platform 100 can obtain the corresponding training data set based on the user's selection operation for model training.
  • the user may not select the training data set built in the management platform 100 .
  • users can upload training datasets by themselves.
  • the user can input the address or path where the training data set is located through the interactive interface 300, and the management platform 100 obtains the corresponding training data set according to the address or path for model training.
  • S404 The management platform 100 acquires an initial graph network model.
  • the management platform 100 may have at least one built-in initial graph network model.
  • the management platform 100 may have built-in one or more of a graph convolutional network model, a graph attention network model, a graph autoencoder model, a graph generation network model, or a graph spatio-temporal network model.
  • the user can select an initial graph network model from at least one initial graph network model built in the management platform 100 to be used for building an AI integrated model.
  • the user may not select the initial graph network model built into the management platform 100 .
  • users can upload initial graph network models by themselves.
  • the user can input the address or path where the initial graph network model is located through the interactive interface 300, and the management platform 100 obtains the corresponding initial graph network model according to the address or path for building an AI integrated model.
  • the management platform 100 acquires multiple base models.
  • the management platform 100 can obtain multiple base models according to the user's selection.
  • the base model is an AI model trained by AI.
  • the AI model can be a random forest model, a decision tree model or a neural network model.
  • the multiple base models selected by the user may be built in the management platform 100, or may be pre-uploaded by the user.
  • the user can also upload the base model in real time, so that the management platform 100 can obtain the above base model.
  • the management platform 100 may provide at least one instance of the above models for users to choose.
  • the instance provided by the management platform 100 may be built in the management platform 100, or may be pre-uploaded by the user, and the user may select at least one instance from it as a base model for building an AI integrated model.
  • the user can also configure the number of instances as N (N is an integer), so that the management platform 100 can obtain N instances of the model for building the AI integrated model.
  • the user can select multiple model instances as the base model for building the AI integrated model, and the user can configure the number of instances for each instance, so that the management platform 100 can obtain the corresponding number of instances respectively, and use for building AI integration models.
  • the management platform 100 may also generate a base model according to the user's selection. For example, a user can choose to generate a neural network model as a base model. Specifically, the management platform 100 can train a super network, and obtain multiple base models from the super network. Wherein, the management platform 100 trains the hypernetwork and obtains multiple basic models from the supernetwork, which is described in detail below and will not be highlighted here.
  • S402, S404, and S406 may be executed in parallel, or may be executed sequentially in a set order.
  • the management platform 100 may also execute S404, S406 first, and then execute S402.
  • the embodiment of the present application does not limit the execution order of S402 to S406.
  • the management platform 100 uses the training data in the training data set and multiple base models to iteratively train the initial graph network model to obtain the graph network model.
  • each iteration includes: the management platform 100 respectively inputs a part of the training data (may be referred to as: the first training data) in the training data set to each base model, and obtains each base model to perform inference on the above-mentioned first training data. Then, the management platform 100 constructs a graph structure from the output of a plurality of base models inferring the above-mentioned first training data; then, the management platform 100 uses the graph structure to train the initial graph network model.
  • the management platform 100 respectively inputs a part of the training data (may be referred to as: the first training data) in the training data set to each base model, and obtains each base model to perform inference on the above-mentioned first training data. Then, the management platform 100 constructs a graph structure from the output of a plurality of base models inferring the above-mentioned first training data; then, the management platform 100 uses the graph structure to train the initial graph network model.
  • the first training data is several data in the training data set.
  • the training data in the training data set can be divided into multiple batches according to batchsize, and the first training data can be one batch of training data.
  • the training data set includes 10,000 pieces of training data, and the batchsize can be 100, then the training data set can be divided into 100 batches, and the first training data can be one of the 100 batches of data.
  • Each base model can perform feature extraction on the first training data to obtain features. This feature can actually be represented by a vector or matrix.
  • the output of each base model after performing inference on the first training data may include the above-mentioned features.
  • a graph structure is a data structure that includes multiple nodes. Further, the graph structure further includes edges between at least two nodes among the plurality of nodes.
  • the management platform 100 can determine the similarity between the outputs of multiple base models after inferring the first training data, for example, based on the distance of the features output by multiple base models, determine the The similarity between the outputs of the models. Then the management platform 100 uses the output of each base model in the multiple base models after inferring the first training data as nodes in the graph structure, determines the edges between the nodes according to the similarity, and determines the edges between the nodes according to the The nodes and the edges are used to obtain the graph structure.
  • the management platform 100 uses the graph structure to train the initial graph network model. Specifically, the graph structure can be input into the initial graph network model. Through the initial graph network model, the information of the nodes can be aggregated based on edge information, so as to extract features from the graph structure. . It should be noted that this feature is a feature that fuses the outputs of multiple base models. Then the management platform 100 can input the features output by the initial graph network model to the decision-making layer for decision-making and obtain decision-making results.
  • the decision-making layer may be a classifier or a regressor, etc., and correspondingly, the decision result may be a classification result or a regression result.
  • the management platform 100 can calculate the function value of the loss function according to the decision result and the label of the training data, that is, the loss value, and then the management platform 100 can use the gradient descent method to update the parameters of the initial graph network model based on the gradient of the loss value, thereby realizing iterative training Initial graph network model.
  • the embodiment of the present application also uses the initial graph network model as a graph convolutional network model for illustration.
  • the management platform 100 obtains multiple base models such as base model 1, base model 2, base model 3, base model 4, etc., and the management platform 100 can be based on the base model
  • the output of model 1 to base model 4 builds a graph structure.
  • X1, X2, X3, and X4 represent the outputs of base model 1 to base model 4 respectively, and the management platform 100 uses X1, X2, X3, and X4 as nodes, and determines based on the similarity of X1, X2, X3, and X4
  • the edges X1X2, X1X3, X1X4, X2X3, X2X4, and X3X4 can be determined based on the similarity, and the graph structure can be obtained according to the above nodes and edges.
  • the management platform 100 then inputs the graph structure into a graph convolutional network model, which includes a graph convolutional layer.
  • the graph convolutional layer can convolve the input of the graph convolutional network model to obtain the convolution result.
  • the graph convolutional network model can be represented by a map f(.). This mapping f(.) enables the graph convolutional network model to aggregate node information based on edge information.
  • X4 as an example, when the graph convolution layer of the graph convolutional network model convolutes X4, X1, X2, and X3 associated with X4 also participate in the convolution operation, and the convolution result Z4 is obtained.
  • the graph convolution layer can perform convolution operations on X1, X2, and X3 to obtain convolution results Z1, Z2, and Z3.
  • the above convolution result is used to characterize the feature extracted by the graph convolutional network model, and the feature may be a feature fused with outputs of multiple base models.
  • the management platform 100 can also use the graph convolution network model obtained by simplifying the Chebyshev network ChebNet as the initial Graph Convolutional Network Model.
  • ChebNet approximates the convolution kernel g ⁇ by using a high-order approximation of the Laplacian matrix (such as polynomial expansion), which greatly reduces the amount of parameters and makes the graph convolutional network model local.
  • the convolution kernel g ⁇ is parameterized into the form of formula (1):
  • ⁇ k is a learnable parameter in the graph convolutional network model, which represents the weight of the kth item in the polynomial.
  • K is the highest order of the polynomial, and ⁇ is the eigenvalue matrix, usually a symmetric matrix.
  • x is the input
  • g ⁇ is the convolution kernel
  • ⁇ 0 and ⁇ 1 are the weights of polynomials.
  • L is a normalized Laplacian matrix
  • I n is an n-order identity matrix.
  • A is the adjacency matrix and D is the degree matrix.
  • the matrix A+I n after adding the identity matrix to the adjacency matrix A, is a matrix with added self-loops
  • Z is used to represent the convolution result of multi-dimensional convolution
  • X represents the matrix form of the input, that is, the input matrix
  • W represents the parameter matrix.
  • the parameter matrix includes feature transformation parameters, such as a learnable parameter ⁇ in the graph convolutional network model, which is specifically a parameter for enhancing features.
  • the management platform 100 can use the initial graph convolution network model to fuse the output of the base model using the formula (5) to obtain the fused feature, which can specifically be the convolution result Z shown in the formula (5), and then Input features to a decision layer such as a classifier to obtain classification results.
  • the management platform 100 can calculate the loss value according to the classification result and the label of the training data, and then update the parameter matrix W of the graph convolutional network model according to the gradient of the loss value, thereby realizing iterative training of the graph convolutional network model.
  • the management platform 100 may stop the training, and determine the above-mentioned trained initial graph network model as the graph network model.
  • the preset condition can be set according to experience value.
  • the preset condition may be that the loss value tends to converge, the loss value is smaller than the preset value, or the performance reaches the preset performance.
  • the performance may be an index such as precision, based on which, the performance reaching the preset performance may be the precision reaching 95%.
  • S410 The management platform 100 builds the graph network model and multiple base models into an AI integrated model.
  • the management platform 100 can form the output of multiple base models into a graph structure, and then use the graph structure as the input of the graph network model, thereby realizing the integration of multiple base models and the graph network model, and then obtaining an AI integrated model .
  • the base model is used to extract features
  • the graph network model is used to fuse the features extracted by multiple base models to obtain the fused features.
  • the AI ensemble model can also be integrated with a decision layer, such as a classifier or a regressor. The fused features are input to the decision-making layer, and classification results or regression results can be obtained, so that specific AI tasks can be completed.
  • the embodiment of the present application provides a method for building an AI integrated model.
  • the management platform 100 constructs the graph network model and multiple base models as an AI integrated model.
  • the AI integrated model can construct a graph structure based on the output of multiple base models, and then process the graph structure through a graph network model to fuse the outputs of multiple base models. Since the graph network model considers the neighbor nodes of each node in the graph structure when processing the graph structure, when the graph network model fuses the outputs of multiple base models, it fully considers the differences between the base models and correlation, the features obtained from the AI integrated model constructed by the graph network model and multiple base models are used for the execution of AI tasks, which can improve the accuracy of the execution results of AI tasks.
  • the management platform 100 can also obtain multiple base models by searching according to a neural architecture search (NAS) algorithm. Considering that the NAS algorithm takes a long time, the management platform 100 can also use an optimized NAS algorithm to search and obtain multiple base models.
  • NAS neural architecture search
  • the optimized NAS algorithm may include efficient neural architecture search (efficient neural architecture search, ENAS) algorithm, differentiable architecture search (differentiable architecture search, DARTS) algorithm or agentless architecture search (proxyless NAS) algorithm and other algorithms. any kind.
  • ENAS efficient neural architecture search
  • DARTS differentiable architecture search
  • proxy NAS agentless architecture search
  • the base model obtained through the NAS algorithm or the optimized NAS algorithm is a neural network model.
  • S602 The management platform 100 determines the super network according to the search space.
  • the principle of DARTS is to determine a supernet according to the search space.
  • the hypernetwork can be represented as a directed acyclic graph, each node (node) in the directed acyclic graph can represent a feature map (or feature vector), and the edges (edge) between nodes represent the connection between nodes.
  • the possible operations for example, can be 3*3 convolution, 5*5 convolution, and so on.
  • the selection of operations between nodes is discrete, that is, the search space (representing the set of searchable operations) is discrete.
  • the edges between nodes in the supernetwork are expanded, so that there are more possible operations for connecting nodes, so that the search space can be relaxed.
  • the management platform 100 may expand edges in the search space according to possible operations between nodes configured by the user, so as to relax the search space. Then, the management platform 100 can map the relaxed search space to a continuous space, so as to obtain a super network.
  • the management platform 100 trains the hypernetwork and obtains a base model.
  • the hypernetwork is provided with an objective function.
  • the objective function can be mapped to a differentiable function, so that the management platform 100 can perform model optimization in the continuous space through a gradient descent (GD) method.
  • GD gradient descent
  • DARTS the principle of DARTS is to train neurons (cells) through search methods, such as norm-cell and reduce-cell, and then connect multiple cells to obtain a neural network model.
  • norm-cell is that the size of the output feature map is consistent with the size of the input feature map
  • reduce-cell is that the size of the output feature map is reduced by half compared to the size of the input feature map.
  • FIG. 6B shows a cell, which can be represented as a directed acyclic graph, a node in a directed acyclic graph 1.
  • Node 2, node 3, and node 4 respectively represent feature maps, and the edges between nodes represent possible operations for connecting nodes. Initially, the edges between nodes are unknown.
  • the management platform 100 can respond to the user's configuration operation and expand the edges between nodes into multiple edges (multiple edges as shown in different line types in Figure 6B), correspondingly, the possible operations for connecting nodes Expanded to 8 possible operations, such as 3x3 depth separable convolution, 5x5 depth separable convolution, 3x3 hole convolution, 5x5 hole convolution, 3x3 maximum pooling, 3x3 average pooling, identity, direct connection Etc.
  • 3x3 depth separable convolution such as 3x3 depth separable convolution, 5x5 depth separable convolution, 3x3 hole convolution, 5x5 hole convolution, 3x3 maximum pooling, 3x3 average pooling, identity, direct connection Etc.
  • the management platform 100 can then perform sampling on the super-network to obtain sub-networks.
  • sampling refers to selecting one or more operations from possible operations for connecting nodes.
  • the gradient gradient
  • the parameters of the super-network can be updated based on the gradient, so as to train the super-network.
  • the management platform 100 can perform model optimization by continuously performing the above sampling and updating steps. Referring to (d) in FIG. 6B, (d) shows the optimal sub-network obtained by sampling. The optimal sub-network can be used as the base model.
  • the learnable parameters in the hypernet include operational parameters ⁇ and structural parameters ⁇ .
  • the operation parameter ⁇ represents the operation of connecting between nodes, such as 3x3 depth separable convolution, 5x5 depth separable convolution, 3x3 hole convolution, 5x5 hole convolution, 3x3 maximum pooling, 3x3 average pooling , identity or direct connection and so on.
  • the structural parameter ⁇ is used to characterize the weight of the connection operation between nodes. Based on this, the sampling process can be expressed as a two-level optimization problem with the structural parameter ⁇ as the upper-level variable and the operating parameter ⁇ of the hypernetwork as the lower-level variable. For details, see formula (6):
  • L train represents the loss on the training data set, that is, the training loss
  • L val represents the loss on the verification data set, that is, the verification loss
  • arg represents the variable argument, usually used in combination with the maximum value and minimum value, and is used to represent the variable that makes the expression maximum or minimum.
  • ⁇ * ( ⁇ ) denotes ⁇ that minimizes L train ( ⁇ , ⁇ ).
  • st is the abbreviation of subject to, which is used to express the conditions that need to be met or obeyed.
  • a possible implementation method is to alternately optimize the above operation parameter ⁇ and structure parameter ⁇ .
  • the management platform 100 can alternately perform the following steps: (a) according to the verification loss (for example, the gradient of the verification loss Use the gradient descent method to update the structural parameter ⁇ ; (b) according to the training loss (such as the gradient of the training loss).
  • the operating parameters ⁇ are updated using the gradient descent method.
  • represents the learning rate, Indicates the gradient.
  • the management platform 100 can also perform optimization using gradient approximation to reduce the complexity. Specifically, the management platform 100 can substitute ⁇ * ( ⁇ ) into the verification loss, and then determine the gradient of L val ( ⁇ * ( ⁇ ), ⁇ ) as the gradient of L val ( ⁇ - ⁇ L train ( ⁇ , ⁇ ), ⁇ ) Approximate value of , please refer to formula (7) for details:
  • This method takes the minimum loss on the verification data set (ie verification loss) as the optimization goal, and finds the distribution that produces the optimal sub-network through the gradient descent method, instead of directly looking for the optimal sub-network. This increases the efficiency of the sampling sub-network.
  • the sub-network sampled by the management platform 100 may be used as a base model.
  • the management platform 100 can perform sampling in the same way to obtain multiple base models. Further, considering that the base model may have a poor inference effect on some training data, the management platform 100 may also determine the base model (for example, the i-th base model, where i is a positive integer) The performance of the model, such as the performance of the base model on different classes of training data. The performance may be measured by indicators such as accuracy or inference time, which is not limited in this embodiment. The process of obtaining multiple base models will be described in detail below.
  • Step 1 The management platform 100 determines the supernet according to the search space.
  • Step 2 The management platform 100 trains the hypernetwork and obtains the base model.
  • the implementation of the management platform 100 determining the supernetwork, training the supernetwork, and obtaining the base model can refer to the related content descriptions in FIG. 6A and FIG. 6B .
  • the first basic model obtained by the management platform 100 is ⁇ 0 .
  • Step 3 The management platform 100 determines the performance of the base model.
  • the performance of the base model can be measured by the accuracy of the execution result of the AI task obtained through the base model.
  • the management platform 100 can input the training data used to evaluate the accuracy into the base model, perform classification according to the features extracted by the base model, and then determine misclassified training data and correctly classified training data based on the classification results and the labels of the training data.
  • the management platform 100 can obtain the accuracy of the base model according to the number of misclassified training data and the number of correctly classified training data in each category of training data.
  • the management platform 100 may train the base model for K rounds first, and then determine the performance of the base model. Among them, K is a positive integer. Further, the management platform 100 can also judge whether the performance of the base model has reached the preset performance, if so, it can directly stop sampling, and directly complete the corresponding AI task based on the base model; if not, it can perform steps 4 to 5 to Continue sampling to obtain the next base model.
  • Step 4 The management platform 100 can update the weights of the training data according to the performance of the base model.
  • the management platform 100 can increase the weight of the first type of training data in the training data set, and/or reduce the weight of the training data.
  • the weight of the training data for the second category in the dataset In this way, the training data of the first category has a higher probability to be used for training the supernetwork, and the training data of the second category has a lower probability to be used for training the supernetwork.
  • the management platform 100 to update the weights of the training data, and two of the implementations will be described below as examples.
  • the management platform 100 may update the weights of the training data according to a linear function.
  • the linear function is specifically a function that characterizes the linear relationship between the weight of the training data and the performance of the base model.
  • the management platform 100 can also normalize the weights. For example, the management platform 100 may set the sum of the weights of different categories of training data to 1.
  • the management platform 100 may use the Adaboost method to update the weights of the training data. See formula (8) for details:
  • E i represents the error rate of the base model ⁇ i
  • ⁇ i represents the coefficient of the base model ⁇ i
  • W i (j) is the weight of the training data x j used to train the current base model (for example, the base model ⁇ i )
  • W i+1 (j) is the weight of the training data x j used to train the next base model (for example, base model ⁇ i+1 ).
  • Z i is a normalization coefficient such that W i (j) can represent a distribution.
  • h i ( ) is the inference result of the base model ⁇ i
  • y j is the label in the sample data.
  • the training platform 102 can obtain the error rate E i of the base model ⁇ i , for example, can determine the error rate of the base model ⁇ i based on the accuracy of the base model ⁇ i . Then the training platform 102 calculates the coefficient ⁇ i of the base model according to the error rate E i of the base model ⁇ i . Next, the training platform 102 adjusts the weight according to whether the prediction result h i (x j ) of the base model ⁇ i on the sample data x j is equal to the label y j in the sample data.
  • the training platform 102 can multiply W i (j) by The updated weight W i+1 (j) is obtained; when h i (x j ) ⁇ y j , the training platform 102 can be multiplied on the basis of W i (j) by An updated weight W i+1 (j) is obtained.
  • Step 5 The management platform 100 uses the training data after updating the weights to train the super network, and samples from the super network to obtain the next base model.
  • the training data with high weight has a higher probability to be selected for training the super network to obtain the base model
  • the training data with low weight has a lower probability to be selected for training the super network.
  • the hypernetwork can be trained mainly based on training data with high weight, and the base model sampled during the training process has better performance on this type of training data. Therefore, the multiple base models obtained by the management platform 100 from the training process of the supernetwork can achieve complementary performance, and the accuracy of the execution result of the AI task obtained based on the AI integrated model integrating the above multiple base models can be significantly improved.
  • the management platform 100 when the management platform 100 uses the training data after updating the weights to train the supernetwork to obtain the next base model, it can train the original hypernetwork based on the training data after the weight updating, or it can be based on the training data after the weight updating. Data, fine tune the hypernetwork. Among them, fine-tuning refers to making small adjustments to the pre-trained model. Specifically in this embodiment, the management platform 100 can retrain the trained hypernetwork based on the training data after the weights are updated, without having to train the hypernetwork from scratch, so as to realize the fine-tuning of the hypernetwork and reduce the complexity of training.
  • the management platform 100 can train the initial graph network model based on the training data set and the obtained multiple base models to obtain a graph network model. Then the management platform 100 determines whether the performance of the graph network model reaches the preset performance, if so, the training can be stopped, and an AI integrated model is constructed according to the graph network model and multiple base models, if not, a new base model can be continued to be sampled, and When the performance of the new base model does not reach the preset performance, a graph network model is obtained by training based on the training data set and multiple base models including the new base model.
  • FIG. 1 to FIG. 7 introduce the construction method of the AI integrated model in detail, and the AI integrated model constructed by the above method can be used to reason the input data for the realization of the AI task. Next, the reasoning method of the AI ensemble model is introduced.
  • the reasoning method of the AI integrated model can be executed by the reasoning device.
  • the reasoning device may be a software device.
  • the software device may be deployed in a computing device or a computing device cluster, and the computing device cluster executes the reasoning method of the AI integrated model provided by the embodiment of the present application by running the software device.
  • the reasoning device may also be a hardware device. When the hardware device is running, it executes the reasoning method of the AI integrated model provided by the embodiment of the present application.
  • the reasoning device is used as a software device for illustration below.
  • the communication module 802 is used to obtain input data
  • the first reasoning module 804 is used to input the input data into each base model respectively, and obtains the output after each base model performs reasoning on the input data
  • the construction module 806 uses After constructing the output of the plurality of base models into a graph structure, the second reasoning module 808 is used to input the graph structure into the graph network model, and obtain the reasoning result of the AI integrated model based on the graph network model.
  • the reasoning apparatus 800 may be deployed in a cloud environment.
  • the reasoning device 800 can provide users with reasoning cloud services for use by users.
  • the user can trigger the operation of starting the reasoning device 800 through a client (eg, a browser or a dedicated client), so as to create an instance of the reasoning device 800 in the cloud environment.
  • the user interacts with the instance of the inference device 800 through the client to execute the inference method of the AI integrated model.
  • the reasoning device 800 may also be deployed in an edge environment, or in user terminals such as desktops, laptops, and smart phones.
  • the reasoning apparatus 800 may also be deployed in different environments in a distributed manner.
  • each module of the reasoning apparatus 800 may be distributed and deployed in any two environments of cloud environment, edge environment and end device, or deployed in the above three environments.
  • the method includes:
  • the reasoning device 800 acquires input data.
  • the reasoning device 800 includes an AI integrated model.
  • Different training data can build different AI integrated models, and different AI integrated models can be used to complete different AI tasks.
  • the training data marked with categories of images can build an AI integrated model for classifying images
  • the training data marked with translation sentences can build an AI integrated model for translating text.
  • the reasoning device 800 may receive input data uploaded by a user, or obtain input data from a data source. According to different AI tasks, the input data received by the reasoning device 800 may be of different types. Taking the AI task as an image classification task as an example, the input data received by the reasoning device 800 may be an image to be classified, the goal of the AI task is to classify the image, and the execution result of the AI task may be the category of the image.
  • the reasoning device 800 respectively inputs input data into each base model in the AI integrated model, and obtains an output of each base model after reasoning the input data.
  • each base model is a trained AI model.
  • the base model can be a trained random forest model or a decision tree model, etc., or a neural network model sampled from a super network.
  • the inference device 800 inputs the input data into each base model respectively, and each base model can perform feature extraction on the input data, and obtain an output after each base model performs inference on the input data.
  • the reasoning device 800 inputs the image to be classified to each base model in the AI integrated model, and obtains the output of each base model after reasoning the image to be classified.
  • the output of each base model after performing inference on the image to be classified is the feature extracted by each base model from the image to be classified.
  • the reasoning device 800 constructs the outputs of the multiple base models into a graph structure.
  • the reasoning device 800 may determine the similarity between the outputs of every two base models among the multiple base models.
  • the output of multiple base models can be represented by features, therefore, the similarity between the outputs of every two base models can be represented by the distance between features.
  • the inference device 800 can use the output of each of the multiple base models as a node in the graph structure, determine the edges between the nodes according to the similarity between the outputs of each two base models, and then construct into a graph structure.
  • the reasoning device 800 may set a similarity threshold. In some possible implementations, when the distance between the two features is greater than the similarity threshold, it can be determined that there is an edge between the nodes corresponding to the two features, and when the distance between the two features is less than or equal to the similarity threshold, Then it can be determined that the nodes corresponding to the two features do not include edges. In some other possible implementation manners, the reasoning device 800 may also set an edge between any two nodes, and then assign weights to the corresponding edges according to the distance between the features.
  • the reasoning device 800 inputs the graph structure into the graph network model, and obtains an inference result of the AI integrated model based on the graph network model.
  • the reasoning device 800 inputs the constructed graph structure into the graph network model, and the graph network model can process the graph structure, for example, perform convolution processing on the graph structure through the graph convolutional network model, so as to obtain the inference result of the AI integrated model.
  • the reasoning result of the AI integrated model can be the feature of the input data, which is specifically the fused feature obtained by fusing the features extracted by multiple base models by the graph network model.
  • the reasoning device 800 constructs a graph structure based on the features extracted from the image to be classified by each base model, and then inputs the graph structure into the graph network model to obtain the reasoning result of the AI integrated model.
  • the reasoning result may be a fused feature obtained by fusing features extracted from multiple base models by the graph network model in the AI integrated model.
  • the reasoning device 800 inputs the reasoning result of the AI integrated model to the decision-making layer, and uses the output of the decision-making layer as the execution result of the AI task.
  • the decision-making layer can be of different types.
  • the decision layer can be a classifier, and for regression tasks, the decision layer can be a regressor.
  • the reasoning device 800 can input the reasoning result (for example, the fused feature) of the AI integrated model to the decision-making layer for decision-making, and use the output of the decision-making layer as the execution result of the AI task.
  • the reasoning device 800 may input the fused features to the classifier for classification to obtain the category of the image. Among them, the category of the image is the execution result of the classification task.
  • the AI integrated model can also be used to preprocess the input data, and the reasoning result of the AI integrated model is the result of the preprocessing.
  • the reasoning device 800 can input the reasoning result of the AI integrated model into the downstream task model.
  • the task model is an AI model that has been trained for a specific AI task.
  • the reasoning device 800 may use the task model to perform further feature extraction on the reasoning result, and make a decision based on the feature after further feature extraction, and use the result obtained by the decision as the execution result of the AI task.
  • the reasoning device 800 may also present the execution result of the AI task to the user, so that the user can take corresponding measures or perform corresponding actions according to the execution result. This embodiment of the present application does not limit it.
  • the embodiment of the present application provides an inference method for an AI integrated model.
  • the inference device 800 inputs input data into multiple base models, constructs the outputs of the multiple base models into a graph structure, and then processes the graph structure through a graph network model to fuse the outputs of the multiple base models. Since the graph network model considers the neighbor nodes of each node in the graph structure when processing the graph structure, when the graph network model fuses the outputs of multiple base models, it fully considers the differences between the base models The accuracy of the execution results of AI tasks obtained according to the AI integration model constructed by the graph network model and multiple base models can be significantly improved.
  • the management platform 100 (that is, the management system) includes:
  • An interaction unit 102 configured to obtain a training data set, an initial graph network model, and a plurality of base models, wherein each base model is a trained AI model;
  • a training unit 104 configured to use the training data in the training data set and the plurality of base models to iteratively train the initial graph network model to obtain a graph network model;
  • a construction unit 106 configured to construct the graph network model and the multiple base models as the AI integrated model, wherein the input of the graph network model is a graph structure composed of outputs of the multiple base models .
  • the training unit 104 uses the training data in the training data set and the multiple base models to iteratively train the initial graph network model, and each iteration includes:
  • the initial graph network model is trained using the graph structure.
  • the multiple base models include one or more of the following types of AI models: a decision tree model, a random forest model, and a neural network model.
  • the interaction unit 102 is specifically configured to:
  • the super network is trained by the training unit, and a plurality of basic models are obtained from the super network.
  • the training unit 104 is specifically configured to:
  • the hypernetwork is trained by using the training data in the training data set after the weights are updated to obtain the i+1th base model.
  • the training unit 104 is specifically configured to:
  • the training unit 104 is specifically configured to:
  • the training unit 104 is specifically configured to:
  • the graph network model includes any one of a graph convolutional network model, a graph attention network model, a graph autoencoder model, a graph generation network model, or a graph spatio-temporal network model.
  • the graph convolutional network model includes a graph convolutional network model obtained by simplifying a Chebyshev network.
  • the management platform 100 may correspond to the implementation of the method described in the embodiment of the present application, and the above-mentioned and other operations and/or functions of the various modules/units of the management platform 100 are respectively in order to realize the embodiment shown in FIG. 4 For the sake of brevity, the corresponding processes of each method are not repeated here.
  • the reasoning device 800 includes:
  • the first reasoning module 804 is configured to input the input data into each base model in the AI integrated model, and obtain the output of each base model after reasoning the input data, wherein each base model
  • the model is a trained AI model
  • a construction module 806, configured to construct the output of the plurality of base models into a graph structure
  • the second reasoning module 808 is configured to input the graph structure into the graph network model, and obtain a reasoning result of the AI integrated model based on the graph network model.
  • the building module 806 is specifically used to:
  • the inference result of the AI integrated model is a feature of the input data.
  • the device 800 further includes:
  • the execution module is configured to input the reasoning result of the AI integrated model to the decision-making layer, and use the output of the decision-making layer as the execution result of the AI task.
  • the device 800 further includes:
  • the execution module is used to input the reasoning result of the AI integrated model into the task model, use the task model to perform further feature extraction on the reasoning result, and make a decision based on the features after further feature extraction, and obtain the result of the decision
  • the result is an execution result of the AI task, wherein the task model is an AI model trained for the AI task.
  • the reasoning device 800 may correspond to the implementation of the method described in the embodiment of the present application, and the above-mentioned and other operations and/or functions of the various modules/units of the reasoning device 800 are to realize the embodiment shown in FIG. 10 For the sake of brevity, the corresponding processes of each method are not repeated here.
  • the embodiment of the present application also provides a computing device cluster.
  • the computing device cluster may be a computing device cluster formed by at least one computing device in a cloud environment, an edge environment, or a terminal device.
  • the computing device cluster is specifically used to implement the functions of the management platform 100 in the embodiment shown in FIG. 1 .
  • FIG. 11 provides a schematic structural diagram of a computing device cluster.
  • the computing device cluster 10 includes multiple computing devices 1100 , and the computing device 1100 includes a bus 1101 , a processor 1102 , a communication interface 1103 and a memory 1104 .
  • the processor 1102 , the memory 1104 and the communication interface 1103 communicate through the bus 1101 .
  • the bus 1101 may be a peripheral component interconnect standard (peripheral component interconnect, PCI) bus or an extended industry standard architecture (extended industry standard architecture, EISA) bus, etc.
  • PCI peripheral component interconnect
  • EISA extended industry standard architecture
  • the bus can be divided into address bus, data bus, control bus and so on. For ease of representation, only one thick line is used in FIG. 11 , but it does not mean that there is only one bus or one type of bus.
  • the processor 1102 may be a central processing unit (central processing unit, CPU), a graphics processing unit (graphics processing unit, GPU), a microprocessor (micro processor, MP) or a digital signal processor (digital signal processor, DSP) etc. Any one or more of them.
  • CPU central processing unit
  • GPU graphics processing unit
  • MP microprocessor
  • DSP digital signal processor
  • the communication interface 1103 is used for communicating with the outside.
  • the communication interface 1103 can be used to acquire a training data set, an initial graph network model and multiple base models, or the communication interface 1103 can be used to output an AI integrated model constructed based on multiple base models. and many more.
  • the memory 1104 may include a volatile memory (volatile memory), such as a random access memory (random access memory, RAM).
  • volatile memory such as a random access memory (random access memory, RAM).
  • Memory 1104 can also include non-volatile memory (non-volatile memory), such as read-only memory (read-only memory, ROM), flash memory, hard disk drive (hard disk drive, HDD) or solid state drive (solid state drive) , SSD).
  • volatile memory such as a random access memory (random access memory, RAM).
  • RAM random access memory
  • Memory 1104 can also include non-volatile memory (non-volatile memory), such as read-only memory (read-only memory, ROM), flash memory, hard disk drive (hard disk drive, HDD) or solid state drive (solid state drive) , SSD).
  • non-volatile memory such as read-only memory (read-only memory, ROM), flash memory, hard disk drive (hard disk drive, HDD) or solid state drive (solid state drive) , SSD
  • Executable codes are stored in the memory 1104, and the processor 1102 executes the executable codes to execute the foregoing method for building an AI integrated model.
  • FIG. 12 provides a schematic structural diagram of a computing device cluster.
  • the computing device cluster 20 includes multiple computing devices 1200 , and the computing devices 1200 include a bus 1201 , a processor 1202 , a communication interface 1203 and a memory 1204 .
  • the processor 1202 , the memory 1204 and the communication interface 1203 communicate through the bus 1201 .
  • Executable codes are stored in at least one memory 1204 in the computing device cluster 20, and at least one processor 1202 executes the executable codes to execute the reasoning method of the aforementioned AI integrated model.
  • the embodiment of the present application also provides a computer-readable storage medium.
  • the computer-readable storage medium may be any available medium that a computing device can store, or a data storage device such as a data center that includes one or more available media.
  • the available media may be magnetic media (eg, floppy disk, hard disk, magnetic tape), optical media (eg, DVD), or semiconductor media (eg, solid state hard disk), etc.
  • the computer-readable storage medium includes instructions, and the instructions instruct the computing device to execute the aforementioned method for building an AI integrated model applied to the management platform 100 , or instruct the computing device to execute the aforementioned inference method applied to the reasoning device 800 .
  • the embodiment of the present application also provides a computer program product.
  • the computer program product includes one or more computer instructions. When the computer instructions are loaded and executed on the computing device, the processes or functions according to the embodiments of the present application will be generated in whole or in part.
  • the computer instructions may be stored in or transmitted from one computer-readable storage medium to another computer-readable storage medium, e.g. (such as coaxial cable, optical fiber, digital subscriber line (DSL)) or wirelessly (such as infrared, wireless, microwave, etc.) to another website site, computer or data center.
  • another computer-readable storage medium e.g. (such as coaxial cable, optical fiber, digital subscriber line (DSL)) or wirelessly (such as infrared, wireless, microwave, etc.) to another website site, computer or data center.
  • the computer program product may be a software installation package, and if any of the aforementioned methods for building an AI ensemble model or the reasoning method for an AI ensemble model is required, the computer program product may be downloaded and executed on a computing device.
  • Computer Program Products may be a software installation package, and if any of the aforementioned methods for building an AI ensemble model or the reasoning method for an AI ensemble model is required, the computer program product may be downloaded and executed on a computing device.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Data Mining & Analysis (AREA)
  • Computing Systems (AREA)
  • General Health & Medical Sciences (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • General Engineering & Computer Science (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Computational Linguistics (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Molecular Biology (AREA)
  • Mathematical Physics (AREA)
  • Multimedia (AREA)
  • Databases & Information Systems (AREA)
  • Medical Informatics (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Biology (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

一种构建人工智能(AI)集成模型的方法,包括:获取训练数据集、初始图网络模型和多个基模型,然后利用训练数据集中的训练数据和多个基模型,迭代训练初始图网络模型,获得图网络模型,接着将图网络模型和多个基模型构建为AI集成模型,其中,图网络模型的输入为由多个基模型的输出构成的图结构。由于图网络模型在对图结构进行处理时,会考虑图结构中各节点的邻居节点,因此,图网络模型在对多个基模型的输出进行融合时,充分考虑了各个基模型之间的差异性和相关性,由此根据AI集成模型中的多个基模型和图网络模型获得的特征在用于AI任务的处理时,可以提高AI任务的执行结果的精度。

Description

构建AI集成模型的方法、AI集成模型的推理方法及装置
本申请要求于2021年05月31日提交中国国家知识产权局、申请号为202110602479.6、发明名称为“一种基于图网络的人工智能大模型构建方法”的中国专利申请的优先权,以及要求于2021年08月24日提交中国国家知识产权局、申请号为202110977566.X、发明名称为“构建AI集成模型的方法、AI集成模型的推理方法及装置”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请涉及人工智能(artificial intelligence,AI)技术领域,尤其涉及一种构建AI集成模型的方法、AI集成模型的推理方法、AI集成模型的管理系统、推理装置以及计算设备集群、计算机可读存储介质、计算机程序产品。
背景技术
随着AI技术尤其是深度学习技术的不断发展,AI模型的规模也不断变大,例如许多AI模型的结构逐渐变深、变宽,AI模型的参数量逐渐增加。目前一些AI模型可以基于自身庞大的规模以及大量的计算资源,从海量数据中进行挖掘,以完成相应的AI任务。
其中,大规模的AI模型可以通过集成方式获得。通过集成方式获得的AI模型可以称作AI集成模型,用于形成AI集成模型的多个AI模型可以称作基模型。在推理阶段,可以将AI集成模型中的多个基模型的输出进行融合,得到融合后的推理结果。对于不同AI任务,AI集成模型的融合方式可以是不同的。例如,对于分类任务,通常可以对多个基模型的输出进行投票,以获得AI集成模型的推理结果;又例如,对于回归任务,通常可以对多个基模型的输出取平均值,将平均值作为AI集成模型的推理结果。
然而,上述利用AI集成模型获得最终推理结果的方法,并未考虑AI集成模型中的基模型本身的差异性和相关性,直接对基模型的输出进行平均,或者是进行投票处理来进行基模型的融合,导致AI集成模型不能体现其内部的基模型的互相协作的能力,由此基于该AI集成模型获得的AI任务的执行结果的精度有待提高。
发明内容
本申请提供了一种构建AI集成模型的方法。该方法通过将图网络模型和多个基模型构建为AI集成模型,AI集成模型中的图网络模型在对多个基模型的输出进行融合时,充分考虑了各个基模型之间的差异性和相关性,由此根据图网络模型获得的特征用于AI任务的处理,提高了获得的AI任务的执行结果的精度。
第一方面,本申请提供一种构建AI集成模型的方法。该方法可以由AI集成模型的管理平台执行。该管理平台可以是用于构建AI集成模型的软件系统,计算设备或计算设备集群通过运行该软件系统的程序代码,以执行构建AI集成模型的方法。该管理平台也可以是用于构建AI集成模型的硬件系统。下文以该管理平台为软件系统进行示例说明。
具体地,管理平台可以获取训练数据集、初始图网络模型和多个基模型,然后利用训练数据集中的训练数据和多个基模型,迭代训练初始图网络模型,获得图网络模型,接着将图网络模型和多个基模型构建为AI集成模型,其中,图网络模型的输入为由多个基模型的输出构成的图结构。
该方法中,管理平台根据多个基模型的输出构建图结构,然后通过图网络模型对图结构进行处理,以对多个基模型的输出进行融合。由于图网络模型在对图结构进行处理时,会考虑图结构中各节点的邻居节点,因此,图网络模型在对多个基模型的输出进行融合时,充分考虑了各个基模型之间的差异性和相关性,由此根据图网络模型获得的特征用于后续AI任务的处理,相比于任何一个基模型得到的特征用于后续AI任务的处理可以获得更准确的AI任务的执行结果,也即本申请的技术方案提高了获得AI任务的执行结果的精度。
并且,管理平台通过图网络模型将多个基模型的输出融合,可以采用端到端的并行训练方式进行AI集成模型的训练,一方面降低了模型训练的难度,提高了模型训练的效率,另一方面保障了训练得到的AI集成模型的泛化性能。
在一些可能的实现方式中,管理平台在利用所述训练数据集中的训练数据和所述多个基模型,迭代训练所述初始图网络模型的过程中,每次迭代包括将所述训练数据集中的第一训练数据分别输入至每个基模型,获得每个基模型对所述第一训练数据进行推理后的输出,然后将所述多个基模型对所述第一训练数据进行推理后的输出构建成图结构,接着利用所述图结构训练所述初始图网络模型。
其中,利用图结构训练初始图网络模型,可以使得训练得到的图网络模型能够在对多个基模型的输出进行融合时,充分考虑各个基模型之间的差异性和相关性,由此根据图网络模型获得的特征用于AI任务的处理,提高了AI任务的执行结果的精度。
在一些可能的实现方式中,所述多个基模型包括以下类型的AI模型中的一种或多种:决策树模型、随机森林模型和神经网络模型。其中,决策树模型、随机森林模型等可以用于对结构化的数据进行处理,神经网络模型可以用于对非结构化的数据如图像、文本、语音、视频等类型的数据进行处理。通过不同的基模型可以构建不同的AI集成模型,例如是对结构化的数据进行处理的AI集成模型和对非结构化的数据进行处理的AI集成模型,满足了不同业务的需求。
在一些可能的实现方式中,管理平台可以训练超网络,从所述超网络中获得多个基模型。其中,管理平台从超网络中获得的基模型为神经网络模型。该神经网络模型是管理平台基于用户的选择,通过神经网络搜索方式生成。
与从管理平台内置的模型或者用户预先上传的模型中获得基模型相比,通过训练超网络实时获得的基模型与AI任务的匹配度较高,由此可以提高基于AI集成模型获得的AI任务的执行结果的精度。
在一些可能的实现方式中,管理平台可以对基模型进行组合,以构建出指定大小的AI集成模型,从而满足用户的个性化需求。在构建AI集成模型过程中,管理平台还支持对基模型进行增加或删减,降低了AI集成模型迭代更新的成本。
进一步地,基模型和AI集成模型均可以用于提取特征。因此,管理平台可以先基于基模型获得推理结果,而无需等待AI集成模型构建完成,由此缩短了推理时间,提高了推理 效率。并且提高了对中间结果(如基模型的推理结果)的利用率。
在一些可能的实现方式中,管理平台在训练超网络,从超网络中获得多个基模型时,可以利用所述训练数据集中的训练数据训练超网络,获得第i个基模型,其中,i为正整数,然后管理平台可以根据第i个基模型的性能更新训练数据集中的训练数据的权重,利用更新权重后的训练数据集中的训练数据训练所述超网络,获得第i+1个基模型。
其中,训练数据的权重可以表征训练数据被用于训练超网络的概率。管理平台更新训练数据的权重,可以更新训练数据集中的训练数据被用于训练超网络的概率,如此可以实现根据一些训练数据进行针对性地训练,获得新的基模型,该新的基模型可以与原基模型实现性能互补,由此可以进一步提高基于多个基模型构建的AI集成模型获得的AI任务的执行结果的精度。
在一些可能的实现方式中,当第i个基模型在第二类别的训练数据的性能高于在第一类别的训练数据的性能时,管理平台可以增加训练数据集中的第一类别的训练数据的权重,和/或降低所述训练数据集中的第二类别的训练数据的权重。如此,管理平台可以根据误分类的训练数据着重训练超网络,获得新的基模型。如此获得的多个基模型可以形成互补,提高了基于AI集成模型获得的AI任务的执行结果的精度。
在一些可能的实现方式中,管理平台在利用更新权重后的训练数据训练超网络时,可以是利用更新权重后的训练数据,微调超网络。由于管理平台可以对经过训练的超网络继续进行训练,而不需要从头开始训练,提高了训练效率,加快了训练进度。
在一些可能的实现方式中,管理平台可以确定所述多个基模型中每两个基模型对所述第一训练数据进行推理后的输出之间的相似度,然后以所述多个基模型中每个基模型对所述第一训练数据进行推理后的输出为图结构的节点,根据所述相似度确定所述节点之间的边,根据所述节点和所述边获得所述图结构。
通过上述方式构建的图结构可以通过节点之间的边保留不同基模型的输出之间的相似性等信息,因此,AI集成模型可以通过图网络模型对图结构进行处理,从而实现根据不同基模型的输出之间的相似性等信息对不同基模型的输出进行融合,将该融合后的特征用于AI任务的处理,可以提高AI任务的执行结果的精度。
在一些可能的实现方式中,所述图网络模型包括图卷积网络模型、图注意力网络模型、图自动编码器模型、图生成网络模型或者图时空网络模型中的任意一种。其中,图卷积网络模型等图网络模型具有强大的表达能力,尤其是对非欧数据(非欧几里得结构数据)具有强大的表达能力,能够有效地聚合不同基模型输出的特征,根据上述图网络模型获得的特征用于AI任务的处理,提高了AI任务的执行结果的精度。
在一些可能的实现方式中,所述图网络模型为由切比雪夫网络化简得到的图卷积网络模型。其中,切比雪夫网络通过利用拉普拉斯矩阵的高阶近似(例如是多项式展开)来逼近卷积核,大大降低了参数量,并且使得图卷积网络模型具备局部性。
第二方面,本申请提供一种AI集成模型的推理方法。该方法可以由推理装置执行,AI集成模型包括图网络模型和多个基模型。推理装置可以获取输入数据,然后将所述输入数据分别输入所述AI集成模型中的每个基模型,获得每个基模型对所述输入数据进行推理后的输出。其中,每个基模型为经过训练后的AI模型。接着推理装置可以将多个基模型的输 出构建成图结构,然后将该图结构输入至图网络模型,基于图网络模型获得AI集成模型的推理结果。
在该方法中,推理装置可以通过将多个基模型的输出构建为图结构,通过AI集成模型中的图网络模型对图结构进行处理,如此可以实现根据各个基模型之间的差异性和相关性对多个基模型的输出进行融合,提高了基于AI集成模型所获得的AI任务的执行结果的精度。
在一些可能的实现方式中,推理装置可以确定所述多个基模型中每两个基模型的输出之间的相似度,然后以所述多个基模型中每个基模型的输出为图结构的节点,根据所述相似度确定所述节点之间的边,根据所述节点和所述边获得所述图结构。如此,推理装置可以基于图结构中边的信息,保存多个基模型的输出之间的相似性、差异性等信息,基于该信息对多个基模型的输出进行融合,提高了基于AI集成模型获得的AI任务的执行结果的精度。
在一些可能的实现方式中,所述AI集成模型的推理结果为所述输入数据的特征。该输入数据的特征可以是AI集成模型中的图网络模型对多个基模型提取的特征进行融合所得的融合后的特征。
在一些可能的实现方式中,推理装置可以将所述AI集成模型的推理结果输入至决策层,将所述决策层的输出作为AI任务的执行结果。其中,决策层可以是分类器或者回归器等。
由于推理装置通过AI集成模型提取的特征是基于多个基模型的相似性、差异性进行融合的特征,基于该特征进行进一步决策获得AI任务的执行结果,可以提高AI任务的执行结果的精度。
在一些可能的实现方式中,推理装置可以将所述AI集成模型的推理结果输入至任务模型,利用任务模型对所述推理结果进行进一步的特征提取,以及根据进一步特征提取后的特征进行决策,将所述决策获得的结果作为AI任务的执行结果,其中,所述任务模型为针对所述AI任务被训练完成的AI模型。
在该方法中,推理装置将AI集成模型用于对输入数据进行预处理,以便于下游的任务模型基于预处理后的数据进行特征提取、决策,以完成相应的AI任务。其中,任务模型对预处理后的数据进行特征提取、决策,而不是直接对原始的输入数据进行特征提取、决策,因而具有较高的响应速度和响应效率。
第三方面,本申请提供一种AI集成模型的管理系统。所述系统包括:
交互单元,用于获取训练数据集、初始图网络模型和多个基模型,其中,每个基模型为经过训练后的AI模型;
训练单元,用于利用所述训练数据集中的训练数据和所述多个基模型,迭代训练所述初始图网络模型,获得图网络模型;
构建单元,用于将所述图网络模型和所述多个基模型构建为所述AI集成模型,其中,所述图网络模型的输入为由所述多个基模型的输出构成的图结构。
在一些可能的实现方式中,所述训练单元在利用所述训练数据集中的训练数据和所述多个基模型,迭代训练所述初始图网络模型的过程中,每次迭代包括:
将所述训练数据集中的第一训练数据分别输入至每个基模型,获得每个基模型对所述第一训练数据进行推理后的输出;
将所述多个基模型对所述第一训练数据进行推理后的输出构建成图结构;
利用所述图结构训练所述初始图网络模型。
在一些可能的实现方式中,所述多个基模型包括以下类型的AI模型中的一种或多种:决策树模型、随机森林模型和神经网络模型。
在一些可能的实现方式中,所述交互单元具体用于:
通过训练单元训练超网络,从所述超网络中获得多个基模型。
在一些可能的实现方式中,所述训练单元具体用于:
利用所述训练数据集中的训练数据训练超网络,获得第i个基模型,所述i为正整数;
根据所述第i个基模型的性能更新所述训练数据集中的训练数据的权重;
利用更新权重后的所述训练数据集中的训练数据训练所述超网络,获得第i+1个基模型。
在一些可能的实现方式中,所述训练单元具体用于:
当所述第i个基模型在第二类别的训练数据的性能高于在第一类别的训练数据的性能时,增加所述训练数据集中的第一类别的训练数据的权重,和/或降低所述训练数据集中的第二类别的训练数据的权重。
在一些可能的实现方式中,所述训练单元具体用于:
利用更新权重后的所述训练数据,微调所述超网络。
在一些可能的实现方式中,所述训练单元具体用于:
确定所述多个基模型中每两个基模型对所述第一训练数据进行推理后的输出之间的相似度;
以所述多个基模型中每个基模型对所述第一训练数据进行推理后的输出为图结构的节点,根据所述相似度确定所述节点之间的边,根据所述节点和所述边获得所述图结构。
在一些可能的实现方式中,所述图网络模型包括图卷积网络模型、图注意力网络模型、图自动编码器模型、图生成网络模型或者图时空网络模型中的任意一种。
在一些可能的实现方式中,所述图卷积网络模型包括由切比雪夫网络化简得到的图卷积网络模型。
第四方面,本申请提供一种AI集成模型的推理装置。所述AI集成模型包括图网络模型和多个基模型,所述装置包括:
通信模块,用于获取输入数据;
第一推理模块,用于将所述输入数据分别输入所述AI集成模型中的每个基模型,获得每个基模型对所述输入数据进行推理后的输出,其中,所述每个基模型为经过训练后的AI模型;
构建模块,用于将所述多个基模型的输出构建成图结构;
第二推理模块,用于将所述图结构输入至所述图网络模型,基于所述图网络模型获得所述AI集成模型的推理结果。
在一些可能的实现方式中,所述构建模块具体用于:
确定所述多个基模型中每两个基模型的输出之间的相似度;
以所述多个基模型中每个基模型的输出为图结构的节点,根据所述相似度确定所述节 点之间的边,根据所述节点和所述边获得所述图结构。
在一些可能的实现方式中,所述AI集成模型的推理结果为所述输入数据的特征。
在一些可能的实现方式中,所述装置还包括:
执行模块,用于将所述AI集成模型的推理结果输入至决策层,将所述决策层的输出作为AI任务的执行结果。
在一些可能的实现方式中,所述装置还包括:
执行模块,用于将所述AI集成模型的推理结果输入至任务模型,利用任务模型对所述推理结果进行进一步的特征提取,以及根据进一步特征提取后的特征进行决策,将所述决策获得的结果作为AI任务的执行结果,其中,所述任务模型为针对所述AI任务被训练完成的AI模型。
第五方面,本申请提供一种计算设备集群,所述计算设备集群包括至少一台计算设备。至少一台计算设备包括至少一个处理器和至少一个存储器。所述处理器、所述存储器进行相互的通信。所述至少一个处理器用于执行所述至少一个存储器中存储的指令,以使得计算设备集群执行如第一方面或第二方面的任一种实现方式所述的方法。
第六方面,本申请提供一种计算机可读存储介质,所述计算机可读存储介质中存储有指令,所述指令指示计算设备或计算设备集群执行上述第一方面或第二方面的任一种实现方式所述的方法。
第七方面,本申请提供了一种包含指令的计算机程序产品,当其在计算设备或计算设备集群上运行时,使得计算设备或计算设备集群执行上述第一方面或第二方面的任一种实现方式所述的方法。
本申请在上述各方面提供的实现方式的基础上,还可以进行进一步组合以提供更多实现方式。
附图说明
为了更清楚地说明本申请实施例的技术方法,下面将对实施例中所需使用的附图作以简单地介绍。
图1为本申请实施例提供的一种AI集成模型的管理平台的系统架构图;
图2A为本申请实施例提供的一种管理平台的部署示意图;
图2B为本申请实施例提供的一种管理平台的部署示意图;
图3为本申请实施例提供的一种交互界面的示意图;
图4为本申请实施例提供的一种AI集成模型的构建方法的流程图;
图5为本申请实施例提供的一种图卷积网络模型的原理图;
图6A为本申请实施例提供的一种获取基模型的流程示意图;
图6B为本申请实施例提供的一种神经网络搜索的流程示意图;
图7为本申请实施例提供的一种获取多个基模型的流程示意图;
图8为本申请实施例提供的一种推理装置的结构示意图;
图9为本申请实施例提供的一种推理装置的部署示意图;
图10为本申请实施例提供的一种AI集成模型的推理方法的流程图;
图11为本申请实施例提供的一种计算设备集群的结构示意图;
图12为本申请实施例提供的一种计算设备集群的结构示意图。
具体实施方式
本申请实施例中的术语“第一”、“第二”仅用于描述目的,而不能理解为指示或暗示相对重要性或者隐含指明所指示的技术特征的数量。由此,限定有“第一”、“第二”的特征可以明示或者隐含地包括一个或者更多个该特征。
首先对本申请实施例中所涉及到的一些技术术语进行介绍。
AI模型,是指通过机器学习等AI技术开发和训练得到的、用于实现特定AI任务的算法模型。例如:AI模型可以包括支持向量机(support vector machine,SVM)模型、随机森林(random forest,RF)模型和决策树(decision tree,DT)模型,AI模型也可以包括深度学习(deep learning,DL)模型,如神经网络模型。
为了提高AI模型的性能,还可以将多个独立的AI模型组合形成大规模的AI模型(也称为AI大模型)。利用多个AI模型形成大规模的AI模型的方式可以包括集成的方式,通过集成方式获得的大规模的AI模型也称作AI集成模型。AI集成模型中的用于进行特征提取的AI模型也称为基模型,或称作基学习器。在实际应用时,基模型可以采用决策树模型、随机森林模型或者神经网络模型等。应理解,本申请中的AI集成模型中所包括的基模型相对独立地运行,在推理时,多个基模型的推理结果(也即输出)按照一定的方式进行结合,结合后的输出作为AI集成模型的输出,也即,本申请中的集成实际上是表示基模型的推理结果的集成。
图网络模型,是用于处理图结构的AI模型,例如图神经网络模型。其中,图结构是包括多个节点(也称作顶点vector)的数据结构。多个节点中的至少两个节点之间包括边(edge)。在实际应用时,节点可以通过圆圈表示,边可以通过圆圈之间的连线表征。图结构可以应用于不同场景中,表达具有关联关系的数据。例如,图结构可以用于表示社交网络中用户的关系,具体地,图结构中的节点表征用户,图结构中的边表征用户之间的关系,例如是同事、好友、亲属等等。又例如,图结构可以用于表示航线,具体地,图结构中的节点用于表示城市,图结构中的边用于表示城市之间的航线。
决策层,用于根据输入的特征进行决策的算法结构。其中,决策层通常与用于特征提取的AI模型或者AI集成模型联合使用,用于完成特定的AI任务。例如,基模型或图网络模型可以提取特征,然后可以将提取的特征输入至决策层进行决策。其中,决策层可以包括不同类型,例如决策层可以是分类器或者是回归器。应理解,在一些情况下,AI模型或者AI集成模型可以不包括决策层,也即仅用于特征提取。在推理过程中,经过AI模型或者AI集成模型获得的特征再可以被输入至决策层,用于实现特定的AI任务。在另一些情况下,决策层也可以作为是AI模型或者AI集成模型的一部分,也即,AI模型或者AI集成模型既用于特征提取又用于决策。在这种情况下,在推理阶段,AI模型或者AI集成模型可以直接获得针对AI任务的结果。如无特殊说明,本申请中后续AI集成模型中的基模型和图网络模型表示仅用于特征提取,而不包含决策层的功能,经过AI集成模型获得的特征可以根据AI任务的目标,被继续输入至决策层。
AI任务,是指利用AI模型或AI集成模型的功能完成的任务。例如:AI任务可以包括图像处理(例如图像分割、图像分类、图像识别、图像标注等)、自然语言处理(语言翻译、智能问答)或者语音处理(语音唤醒、语音识别、语音合成)等任务。不同AI任务的难易程度不同。例如,一些AI任务可以由一个简单的经过训练后的AI模型及决策层完成。又例如,还有一些AI任务需要由一个大规模的经过训练的AI模型及决策层完成。
在一些场景中,单个AI模型的推理精度不高,将多个AI模型作为基模型,构建AI集成模型是一种提升精度的策略。相关技术中,多个基模型的输出可以通过投票方式或者加权平均方式进行融合,得到AI集成模型的推理结果。然而,通过该方法获得的AI集成模型的推理结果并未考虑基模型本身的差异性和相关性,由此使得基于该AI集成模型获得的AI任务的执行结果的精度依然不高。另外,AI集成模型中的多个基模型通常是并行训练得到,基模型之间不存在强依赖关系,如此难以充分挖掘各个基模型的优势,可以导致AI集成模型对一些输入数据的推理效果不佳,进而影响了基于该AI集成模型获得的AI任务的执行结果的精度。
有鉴于此,本申请实施例提供了一种构建AI集成模型的方法。该方法可以由AI集成模型的管理平台执行。管理平台可以获取训练数据集、初始图网络模型和多个基模型,然后利用训练数据集中的训练数据和多个基模型,迭代训练初始图网络模型,获得图网络模型,接着将图网络模型和多个基模型构建为AI集成模型,其中,图网络模型的输入为由多个基模型的输出构成的图结构。
该方法中,管理平台根据多个基模型的输出构建图结构,然后通过图网络模型对图结构进行处理,以对多个基模型的输出进行融合。由于图网络模型在对图结构进行处理时,会考虑图结构中各节点的邻居节点,因此,图网络模型在对多个基模型的输出进行融合时,充分考虑了各个基模型之间的差异性和相关性,由此根据图网络模型获得的特征用于AI任务的处理,提高了基于AI集成模型获得的AI任务的执行结果的精度。
并且,在一些实施例中,管理平台在获得多个基模型时,可以根据对超网络的训练获得基模型,基于当前基模型的性能对用于训练超网络的训练数据的权重进行更新,例如,增加基模型误分类训练数据的权重。然后通过更新权重后的训练数据获取下一个基模型,如此多个基模型可以形成互补,由此提高了基于AI集成模型获得的AI任务的执行结果的精度。
为了使得本申请的技术方案更加清楚、易于理解,下面结合附图对AI集成模型的管理平台进行介绍。
参见图1所示的AI集成模型的管理平台的结构示意图,该管理平台100包括交互单元102、训练单元104和构建单元106。进一步地,该管理平台100还可以包括存储单元108。下面对各单元分别进行介绍。
交互单元102用于获取训练数据集、初始图网络模型和多个基模型。其中,每个基模型为经过训练后的AI模型。交互单元102可以通过多种方式获取训练数据集、初始图网络模型和多个基模型。例如,交互单元102可以根据用户的选择,从管理平台100内置的训练数据集、初始图网络模型、基模型中,获取用于构建AI集成模型的训练数据集、初始图 网络模型和多个基模型。又例如,交互单元102还可以接收用户上传的训练数据集、初始图网络模型和多个基模型。
训练单元104用于利用训练数据集中的训练数据和多个基模型,迭代训练初始图网络模型,获得图网络模型。其中,训练单元104在迭代训练初始图网络模型时,每次迭代包括:将训练数据集中的第一训练数据分别输入至每个基模型,获得每个基模型对第一训练数据进行推理后的输出,然后将多个基模型对第一训练数据进行推理后的输出构建成图结构,接着利用该图结构训练初始图网络模型。
第一训练数据可以是训练数据集中的若干训练数据。例如,训练数据集中的训练数据可以按照批大小(batch size)分为若干批,每一批包括的训练数据的数量等于批大小。相应地,第一训练数据可以是若干批训练数据中的一批训练数据。
在一些可能的实现方式中,训练单元104还用于训练超网络,从超网络中获得多个基模型。其中,训练单元104可以基于当前基模型的性能对用于训练超网络的训练数据的权重进行更新,例如,增加基模型误分类训练数据的权重。然后训练单元104通过更新权重后的训练数据训练超网络,获得下一个基模型。如此多个基模型可以形成互补,提高了基于AI集成模型获得的AI任务的执行结果的精度。
构建单元106用于将图网络模型和多个基模型构建为AI集成模型。其中,图网络模型的输入为由多个基模型的输出构成的图结构。具体地,构建单元106用于将多个基模型的输出获得的图结构作为图网络模型的输入,从而使得在推理阶段,多个基模型和图网络模型可以共同用于对输入数据的处理,由此获得AI集成模型的推理结果。由于构建单元106将多个基模型和图网络模型进行了基于输出和输入的互相连接,使得在推理阶段,AI集成模型可以作为一个整体自动地对输入数据进行推理。
存储单元108用于存储管理平台100内置的训练数据集、初始图网络模型和/或基模型。进一步地,存储单元108还可以存储用户上传的训练数据集、初始图网络模型和/或基模型。在一些实施例中,存储单元108也可以对训练单元104训练超网络获得的基模型进行存储。其中,存储单元108还可以对用户通过交互单元102设置的训练参数等进行存储。本实施例对此不作限定。
图1对管理平台100的架构进行了详细说明。下面对管理平台100的部署方式进行详细介绍。应理解,上述AI集成模型的管理平台100也可以称为AI集成模型的管理系统。该AI集成模型的管理系统可以是部署于硬件设备或硬件设备集群的软件系统,AI集成模型的管理系统也可以是由一个或多个硬件设备组成的硬件系统。本申请中,对管理平台100的描述均表示对AI集成模型的管理系统的举例性描述。
在一些可能的实现方式中,如图2A所示,管理平台100可以部署在云环境,当管理平台100为软件系统时,具体为部署在云环境上的一个或多个计算设备(例如:中心服务器),或者当管理平台100为硬件系统时,管理平台100可以包括云环境上的一个或多个计算设备。所述云环境指示云服务提供商拥有的,用于提供计算、存储、通信资源的中心计算设备集群。
具体实现时,用户可以通过客户端(例如是浏览器或者是专用客户端)触发启动管理平台100的操作,然后,用户通过客户端与管理平台100进行交互,以构建AI集成模型。
具体地,管理平台100的交互单元102可以提供交互逻辑,客户端可以基于该交互逻辑向用户呈现交互界面。该交互界面例如可以是图形用户界面(graphical user interface,GUI)或者是命令用户界面(command user interface,CUI)。
为了便于理解,下面以交互界面为GUI进行示例说明。参见图3所示的交互界面300的示意图,该交互界面300支持用户配置训练数据集、基模型以及初始图网络模型。具体地,交互界面300承载有训练数据集配置组件302、基模型配置组件304和初始图网络模型配置组件306。
其中,训练数据集配置组件302包括下拉控件,下拉控件被触发时可以展示下拉框,用户可以从该下拉框中选择管理平台100内置的训练数据集,例如是训练数据集1至训练数据集k中的任意一个。其中,k为正整数。在一些实施例中,用户也可以选择自定义的训练数据集。具体地,当用户从下拉框中选择自定义的训练数据集时,交互界面300可以提供接口以供用户输入自定义的训练数据集所在的地址,如此,客户端可以根据该地址获取自定义的训练数据集。
类似地,基模型配置组件304包括下拉控件,该下拉控件被触发时,可以展示下拉框,下拉框中可以包括管理平台100内置的基模型,例如是随机森林模型、决策树模型或者是神经网络模型等等。其中,随机森林模型、决策树模型可以是经过训练后的AI模型。需要说明的是,管理平台100可以内置随机森林模型的至少一个实例,和/或者,内置决策树模型的至少一个实例。基模型配置组件304的下拉控件被触发时,可以通过下拉框展示管理平台100内置的各种模型的至少一个实例。当用户选择随机森林模型的一个实例或者决策树模型的一个实例时,用户还可以通过基模型配置组件304中的数量配置控件配置该实例的数量。其中,用户也可以通过下拉控件配置多种模型的实例作为基模型,并针对每种模型的实例分别配置实例的数量。
进一步地,下拉控件还可以支持用户上传自定义的模型作为基模型。具体地,下拉控件展示的下拉框中包括用户自定义的模型,用户可以选择自定义的模型,从而触发上传自定义的模型作为基模型的流程。当然,用户也可以预先上传自定义的模型,如此,用户可以在配置基模型时,从用户上传的自定义的模型中选择基模型,以用于构建AI集成模型。
上述用户选择的基模型,可以是管理平台内置存在的,也可以是用户预先上传的。在另一些实施例中,用户选择的基模型还可以是管理平台根据用户的选择即将生成的,例如,当用户选择神经网络模型时,交互界面300还可以提供接口以供用户配置用于获得神经网络模型的相关参数。例如,通过超网络采样的方式获得神经网络模型时,交互界面300可以提供搜索空间、性能指标、性能指标的参考值等参数接口,以便用户通过上述接口配置相应的参数。如此,管理平台100可以基于上述参数,通过神经网络搜索方式获得多个基模型。
初始图网络模型配置组件306包括下拉控件,下拉控件被触发时可以展示下拉框,用户可以从该下拉框中选择管理平台100内置的或者是用户上传的初始图网络模型,例如可以选择图卷积网络(graph convolution networks,GCN)模型、图注意力网络(graph attention networks,GAN)模型、图自动编码器(graph antoencoders,GAE)模型、图生成网络(graph generative networks,GGN)模型或者图时空网络(graph spatial-temporal networks,GSTN) 模型中的任意一种。
交互界面300还承载有确定控件308和取消控件309。当取消控件309被触发时,用户的选择被取消。当确定控件308被触发时,客户端可以将用户配置的上述参数提交至管理平台100。管理平台100可以根据上述配置,获取训练数据集、初始图网络模型和多个基模型,然后基于训练数据集和多个基模型,迭代训练初始图网络模型,获得图网络模型,然后将图网络模型和多个基模型构建成AI集成模型。
需要说明的是,多个用户可以通过各自的客户端触发启动管理平台100的操作,以在云环境中创建与多个用户各自对应的管理平台100的实例。每个用户可以通过各自的客户端与相应的管理平台100的实例进行交互,从而实现构建各自的AI集成模型。
其中,多个用户可以基于各自的AI任务配置相应的训练数据集、初始图网络模型和多个基模型。不同用户配置的训练数据集、初始图网络模型和多个基模型可以是不同的。相应地,不同用户构建得到的AI集成模型可以是不同的。也即管理平台100提供了一站式的AI集成模型构建方法,针对不同用户的不同AI任务或同一用户的不同AI任务均可以构建对应的AI集成模型,具有较高的通用性和可用性,能够满足业务需求。
管理平台100也可以部署在边缘环境中,具体为部署在边缘环境中的一个或多个计算设备(边缘计算设备)上或者管理平台100包括边缘环境中的一个或多个计算设备,边缘计算设备可以为服务器、计算盒子等。所述边缘环境指示在地理位置上距离终端设备(即端侧设备)较近的,用于提供计算、存储、通信资源的边缘计算设备集群。在一些实现方式中,管理平台100还可以部署在终端设备上。终端设备包括但不限于台式机、笔记本电脑、智能手机等用户终端。
在另一些可能的实现方式中,如图2B所示,管理平台100可以分布式地部署在不同环境。例如,交互单元102可以部署在边缘环境,训练单元104、构建单元106可以部署在云环境。用户可以通过客户端触发启动管理平台100的操作,以创建管理平台100的实例。其中,每个管理平台100的实例包括交互单元102、训练单元104和构建单元106。上述单元分布式地部署在云环境和边缘环境中。
图2B仅仅是管理平台100的各个部分分布式地部署在不同环境中一种实现方式。在本申请实施例其他可能的实现方式中,还可以在云环境、边缘环境、端设备中的三个环境,或其中其他两个环境上分别部署管理平台100的一部分。
接下来,从管理平台100的角度,结合附图对本申请实施例的AI集成模型的构建方法进行详细说明。
参见图4所示的AI集成模型的构建方法的流程图,该方法包括:
S402:管理平台100获取训练数据集。
具体地,管理平台100可以内置至少一个训练数据集。内置的训练数据集可以是从开源社区获得的开源数据集,例如ImageNet、OpenImage等。在一些实施例中,内置的训练数据集也可以包括管理平台100的运营商自定义的数据集,管理平台100的运营商租赁或购买的私有数据集等。用户可以从管理平台100内置的至少一个训练数据集中选择一个训练数据集,如此,管理平台100可以基于用户的选择操作,获得相应的训练数据集,以用 于模型训练。
在一些可能的实现方式中,用户也可以不选择管理平台100内置的训练数据集。例如,用户可以自行上传训练数据集。具体地,用户可以通过交互界面300输入训练数据集所在的地址或路径,管理平台100根据该地址或路径,获取相应的训练数据集,以用于模型训练。
S404:管理平台100获取初始图网络模型。
具体地,管理平台100可以内置至少一种初始图网络模型。例如,管理平台100可以内置图卷积网络模型、图注意力网络模型、图自动编码器模型、图生成网络模型或者图时空网络模型中的一种或多种。用户可以从管理平台100内置的至少一种初始图网络模型中,选择一个初始图网络模型,以用于构建AI集成模型。
在一些可能的实现方式中,用户也可以不选择管理平台100内置的初始图网络模型。例如,用户可以自行上传初始图网络模型。具体地,用户可以通过交互界面300输入初始图网络模型所在的地址或路径,管理平台100根据该地址或路径,获取相应的初始图网络模型,以用于构建AI集成模型。
S406:管理平台100获取多个基模型。
具体地,管理平台100可以根据用户的选择,获得多个基模型。该基模型为经过AI训练的AI模型。该AI模型可以是随机森林模型、决策树模型或者是神经网络模型。其中,用户选择的多个基模型可以是管理平台100内置存在的,也可以是用户预先上传的。当然,用户也可以实时上传基模型,以便于管理平台100获得上述基模型。
针对随机森林模型、决策树模型、神经网络模型等不同类型的模型,管理平台100可以提供上述模型的至少一个实例,以供用户选择。其中,管理平台100提供的实例可以是管理平台100内置,也可以是用户预先上传,用户可以从中选择至少一个实例,作为用于构建AI集成模型的基模型。此外,用户还可以配置该实例的数量为N(N为整数),以便于管理平台100获取N个该模型的实例,以用于构建AI集成模型。进一步地,用户可以选择多个模型的实例,作为用于构建AI集成模型的基模型,并且用户可以针对每个实例,分别配置实例的数量,以便于管理平台100分别获取相应数量的实例,用于构建AI集成模型。
在一些可能的实现方式中,管理平台100也可以根据用户的选择,生成基模型。例如,用户可以选择生成神经网络模型作为基模型。具体地,管理平台100可以训练超网络,从超网络中获得多个基模型。其中,管理平台100训练超网络,从超网络中获得多个基模型的具体实现在下文进行详细描述,在此不作重点介绍。
需要说明的是,上述S402、S404、S406可以并行执行,也可以按照设定的顺序先后执行,例如管理平台100也可以先执行S404、S406,然后再执行S402。本申请实施例对S402至S406的执行顺序不作限定。
S408:管理平台100利用训练数据集中的训练数据和多个基模型,迭代训练初始图网络模型,获得图网络模型。
具体地,每次迭代包括:管理平台100将训练数据集中的一部分训练数据(可以称为:第一训练数据)分别输入至每个基模型,获得每个基模型对上述第一训练数据进行推理后 的输出;然后,管理平台100将多个基模型对上述第一训练数据进行推理后的输出构建成图结构;接着,管理平台100利用图结构训练初始图网络模型。
其中,第一训练数据为训练数据集中的若干数据。训练数据集中的训练数据可以按照batchsize分为多个批,第一训练数据可以是其中一批训练数据。例如,训练数据集包括10000条训练数据,batchsize可以是100,则训练数据集可以分为100批,第一训练数据可以为100批数据中的一批。每个基模型可以对第一训练数据进行特征提取,得到特征。该特征实际上可以通过向量或矩阵进行表示。每个基模型对第一训练数据进行推理后的输出可以包括上述特征。
图结构是一种包括多个节点的数据结构。进一步地,图结构还包括多个节点中至少两个节点之间的边。在一些实施例中,管理平台100可以确定多个基模型对所述第一训练数据进行推理后的输出之间的相似度,例如可以基于多个基模型输出的特征的距离,确定多个基模型的输出之间的相似度。然后管理平台100以所述多个基模型中每个基模型对所述第一训练数据进行推理后的输出为图结构的节点,根据所述相似度确定所述节点之间的边,根据所述节点和所述边获得所述图结构。
管理平台100利用图结构训练初始图网络模型,具体可以是将图结构输入初始图网络模型,通过初始图网络模型可以实现基于边的信息对节点的信息进行聚合,从而实现从图结构中提取特征。需要说明的是,该特征是融合多个基模型的输出的特征。然后管理平台100可以将初始图网络模型输出的特征输入至决策层进行决策,获得决策结果。其中,决策层可以是分类器或者回归器等等,相应地,决策结果可以是分类结果或回归结果。管理平台100可以根据决策结果和训练数据的标签计算损失函数的函数值,也即损失值,然后管理平台100可以基于损失值的梯度采用梯度下降法更新初始图网络模型的参数,从而实现迭代训练初始图网络模型。
为了便于理解,本申请实施例还以初始图网络模型为图卷积网络模型进行示例说明。
参见图5所示的图卷积网络模型的原理图,该示例中,管理平台100获取基模型1、基模型2、基模型3、基模型4等多个基模型,管理平台100可以根据基模型1至基模型4的输出构建图结构。为了便于描述,以X1、X2、X3、X4分别表示基模型1至基模型4的输出,管理平台100以X1、X2、X3、X4为节点,基于X1、X2、X3、X4的相似度确定节点的边,例如可以基于相似度确定出边X1X2、X1X3、X1X4、X2X3、X2X4、X3X4,根据上述节点和边可以获得图结构。
然后管理平台100将该图结构输入图卷积网络模型,图卷积网络模型包括图卷积层。图卷积层可以对图卷积网络模型的输入进行卷积,得到卷积结果。其中,图卷积网络模型可以通过一个映射f(.)表征。该映射f(.)使得图卷积网络模型可以根据边的信息聚合节点的信息。以X4为例,图卷积网络模型的图卷积层在对X4卷积时,与X4关联的X1、X2、X3也参与卷积运算,得到卷积结果Z4。类似地,图卷积层可以对X1、X2、X3进行卷积运算,获得卷积结果Z1、Z2、Z3。其中,上述卷积结果用于表征图卷积网络模型提取的特征,该特征可以是融合有多个基模型的输出的特征。
在一些可能的实现方式中,考虑到基于频谱的图卷积存在图卷积核参数量大的问题,管理平台100还可以采用由切比雪夫网络ChebNet化简得到的图卷积网络模型作为初始图 卷积网络模型。
其中,ChebNet通过利用拉普拉斯矩阵的高阶近似(例如是多项式展开)来逼近卷积核g θ,如此大大降低了参数量,并且使得图卷积网络模型具备局部性。具体地,卷积核g θ被参数化成公式(1)的形式:
Figure PCTCN2021142269-appb-000001
其中,θ k为图卷积网络模型中可学习的参数,表示多项式中第k项的权重。K为多项式的最高阶数,Λ为特征值矩阵,通常是对称矩阵。
上述ChebNet还可以被化简,得到一阶近似版本的GCN。具体地,令K=1,拉普拉斯矩阵的最大特征值λ max≈2,则简化后的GCN的卷积结果可以表示为公式(2):
Figure PCTCN2021142269-appb-000002
其中,x为输入,g θ为卷积核。θ 0、θ 1为多项式的权重。L为归一化的拉普拉斯矩阵,I n为n阶单位矩阵。A为邻接矩阵,D为度矩阵。
为了避免过拟合,还可以约束θ=θ 0=-θ 1,以减少图卷积网络模型的参数。此时公式(2)可以进一步简化为:
Figure PCTCN2021142269-appb-000003
算子
Figure PCTCN2021142269-appb-000004
反复使用可以导致梯度爆炸或消失,为了增强训练时的稳定性,还可以对
Figure PCTCN2021142269-appb-000005
进行归一化,具体如公式(4)所示:
Figure PCTCN2021142269-appb-000006
其中,
Figure PCTCN2021142269-appb-000007
为邻接矩阵A增加单位矩阵后的矩阵A+I n
Figure PCTCN2021142269-appb-000008
为添加自环的矩阵,
Figure PCTCN2021142269-appb-000009
上述卷积过程是以一维卷积进行示例说明,将一维卷积推广到多维卷积,可以得到如下卷积结果:
Figure PCTCN2021142269-appb-000010
其中,Z用于表示多维卷积的卷积结果,X表示输入的矩阵形式,即输入矩阵,W表示参数矩阵。该参数矩阵中包括特征变换参数,例如是图卷积网络模型中可学习的参数θ,该参数具体是用于对特征进行增强的参数。
管理平台100可以通过初始图卷积网络模型,利用公式(5)对基模型的输出进行融合,得到融合后的特征,该特征具体可以是如公式(5)所示的卷积结果Z,然后将特征输入至决策层如分类器,可以获得分类结果。管理平台100可以根据分类结果和训练数据的标签计算损失值,然后根据损失值的梯度更新图卷积网络模型的参数矩阵W,由此实现对图卷积网络模型的迭代训练。
经过训练的初始图网络模型(例如图卷积网络模型)满足预设条件时,管理平台100可以停止训练,将上述经过训练的初始图网络模型确定为图网络模型。其中,预设条件可以根据经验值设置。例如,预设条件可以是损失值趋于收敛,损失值小于预设值,或者是性能达到预设性能。其中,性能可以是精度等指标,基于此,性能达到预设性能可以是精度达到95%。
S410:管理平台100将图网络模型和多个基模型构建为AI集成模型。
具体地,管理平台100可以将多个基模型的输出构成图结构,然后将该图结构作为图网络模型的输入,由此实现多个基模型和图网络模型的集成,进而可以获得AI集成模型。 其中,基模型用于提取特征,图网络模型用于对多个基模型提取的特征进行融合,得到融合后的特征。在一些可能的实现方式中,AI集成模型还可以集成有决策层,如分类器或回归器。融合后的特征输入至决策层,可以获得分类结果或回归结果,由此可以完成特定的AI任务。
基于上述内容描述,本申请实施例提供了一种构建AI集成模型的方法。该方法中,管理平台100将图网络模型和多个基模型构建为AI集成模型。该AI集成模型可以根据多个基模型的输出构建图结构,然后通过图网络模型对图结构进行处理,以对多个基模型的输出进行融合。由于图网络模型在对图结构进行处理时,会考虑图结构中各节点的邻居节点,因此,图网络模型在对多个基模型的输出进行融合时,充分考虑了各个基模型之间的差异性和相关性,由图网络模型和多个基模型构建的AI集成模型获得的特征用于AI任务的执行,可以提高AI任务的执行结果的精度。
在图4所示实施例中,管理平台100也可以根据神经网络架构搜索(neural architecture search,NAS)算法,通过搜索方式获得多个基模型。考虑到NAS算法耗时较长,管理平台100还可以采用优化的NAS算法,搜索得到多个基模型。
其中,优化的NAS算法可以包括高效神经网络架构搜索(efficient neural architecture search,ENAS)算法、可微网络架构搜索(differentiable architecture search,DARTS)算法或者无代理架构搜索(proxyless NAS)算法等算法中的任意一种。需要说明的是,通过NAS算法或者优化的NAS算法得到的基模型为神经网络模型。
为了便于理解,下面以通过DARTS算法获取基模型进行示例说明。参见图6A所示的根据DARTS算法获取基模型的流程示意图,具体包括如下步骤:
S602:管理平台100根据搜索空间确定超网络。
DARTS的原理是根据搜索空间确定一个超网络(supernet)。该超网络可以表示为一个有向无环图,有向无环图中的每个节点(node)可以表示特征图(或者特征向量),节点之间的边(edge)表示节点之间进行连接的可能的操作,例如可以是3*3卷积、5*5卷积等等。
通常情况下,节点之间的操作选择是离散的,也即搜索空间(表示可搜索的操作的集合)是离散的。超网络中节点之间的边被扩展,使得节点之间进行连接具有更多可能的操作,如此可以实现搜索空间的松弛化。具体地,管理平台100可以根据用户配置的节点之间可能的操作,扩展搜索空间中的边,以将搜索空间松弛化。然后,管理平台100可以将松弛化的搜索空间映射至连续空间,从而获得超网络。
S604:管理平台100训练超网络,获得基模型。
具体地,超网络设置有目标函数。在搜索空间被映射至连续空间的情况下,目标函数可以映射成可微函数,如此,管理平台100可以在连续空间内,通过梯度下降法(gradient descent,GD)进行模型寻优。
其中,DARTS的原理是通过搜索方式训练出神经元(cell),例如是norm-cell和reduce-cell,然后将多个cell相连,从而获得神经网络模型。其中,norm-cell是输出的特征图尺寸与输入的特征图尺寸保持一致,reduce-cell是输出的特征图尺寸相对于输入的特征图尺寸减小一半。相连的cell的数量可以通过超参数layer控制,例如layer=20,则表征将20 个cell相连,获得神经网络模型。
下面以训练一个cell进行示例说明。参见图6B所示的神经网络搜索的流程示意图,首先参见图6B中的(a),(a)示出了cell,该cell可以表示为有向无环图,有向无环图中的节点1、节点2、节点3和节点4分别表示特征图,节点之间的边表示节点之间进行连接的可能的操作,在初始时,节点之间的边是未知的。管理平台100可以响应于用户的配置操作,将节点之间的边扩展为多条边(如图6B中不同线型所示的多条边),相应地,节点之间进行连接的可能的操作扩展为8种可能的操作,例如是3x3深度可分离卷积、5x5深度可分离卷积、3x3空洞卷积、5x5空洞卷积、3x3最大化池化、3x3平均池化,恒等、直连等等,如此可以将离散的搜索空间松弛化,进而获得如图6B中的(b)所示的超网络。
然后管理平台100可以对超网络执行采样(sampling),以获得子网络(sub-network)。其中,采样是指从节点之间进行连接的可能的操作中选择一种或多种操作。在获得子网络之后还可以计算梯度(gradient),然后基于该梯度更新超网络的参数,以对超网络进行训练。管理平台100通过不断执行上述采样和更新的步骤,可以进行模型寻优。参见图6B中的(d),(d)示出了采样获得的最优子网络。该最优子网络可以作为基模型。
管理平台100获得基模型的关键在于采样,接下来对采样过程进行详细说明。超网络中可学习的参数包括操作参数ω和结构参数α。其中,操作参数ω表征节点之间进行连接的操作,例如为3x3深度可分离卷积、5x5深度可分离卷积、3x3空洞卷积、5x5空洞卷积、3x3最大化池化、3x3平均池化、恒等或直连等等。结构参数α用于表征节点之间进行连接的操作的权值。基于此,采样过程可以表示为以结构参数α为上级变量,以超网络的操作参数ω为下级变量的两级优化问题,具体参见公式(6):
Figure PCTCN2021142269-appb-000011
其中,L train表征在训练数据集上的损失,即训练损失,L val表征在验证数据集上的损失,即验证损失。arg表示变元argument,通常和最大值、最小值结合使用,用于表示使得表达式最大或最小的变元。ω *(α)表示使得L train(ω,α)最小的ω。s.t.为subject to的缩写,用于表示需要满足或服从的条件。基于此,公式(6)表征在满足ω *(α)=arg min ωL train(ω,α)的条件下,使得min αL val*(α),α)最小的α。
为了求解上述公式(6),一种可能的实现方法是交替优化上述操作参数ω和结构参数α。具体地,管理平台100可以交替执行如下步骤:(a)根据验证损失(例如是验证损失的梯度
Figure PCTCN2021142269-appb-000012
采用梯度下降法更新结构参数α;(b)根据训练损失(例如是训练损失的梯度
Figure PCTCN2021142269-appb-000013
采用梯度下降法更新操作参数ω。其中,ξ表示学习速率,
Figure PCTCN2021142269-appb-000014
表示梯度。当通过交替优化所得的子网络在验证数据集上的性能达到预设性能时,则可以终止交替执行上述步骤。
考虑到交替优化的复杂度非常高,管理平台100还可以采用梯度近似进行优化,以降低复杂度。具体地,管理平台100可以将ω *(α)代入验证损失,然后确定L val*(α),α)的梯度作为L val(ω-ξL train(ω,α),α)的梯度的近似值,具体请参见公式(7):
Figure PCTCN2021142269-appb-000015
该方法以在验证数据集上的损失(也即验证损失)最小为优化目标,通过梯度下降法寻找产生最优子网络的分布,而不是直接寻找最优子网络。由此提高了采样子网络的效率。其中,管理平台100采样所得的子网络可以作为基模型。
以上对从超网络中进行采样获得一个基模型进行详细说明。管理平台100可以通过相同的方式进行采样,从而获得多个基模型。进一步地,考虑到基模型可能存在对一些训练数据的推理效果不佳的情况,管理平台100还可以在获得一个基模型(例如是第i个基模型,i为正整数)后,确定该基模型的性能,例如是基模型在不同类别训练数据上的性能。其中,性能可以通过精度或推理时间等指标进行衡量,本实施例对此不作限定。下面对获取多个基模型的过程进行详细说明。
参见图7所示的获取多个基模型的流程示意图,具体包括如下步骤:
步骤1:管理平台100根据搜索空间,确定超网络。
步骤2:管理平台100训练超网络,获得基模型。
其中,管理平台100确定超网络,以及训练超网络,获得基模型的实现可以参见图6A和图6B相关内容描述。本实施例中假定管理平台100获得的第一个基模型为φ 0
步骤3:管理平台100确定基模型的性能。
基模型的性能可以由通过基模型获得的AI任务的执行结果的精度进行衡量。具体地,管理平台100可以将用于评估精度的训练数据输入基模型,根据该基模型提取的特征进行分类,然后基于分类结果和训练数据的标签确定误分类的训练数据和正确分类的训练数据。管理平台100可以根据各类别的训练数据中误分类的训练数据的数量和正确分类的训练数据的数量,获得基模型的精度。
需要说明的是,在采样获得基模型后,管理平台100还可以先训练基模型K轮,然后再确定该基模型的性能。其中,K为正整数。进一步地,管理平台100还可以判断基模型的性能是否达到预设性能,若是,则可以直接停止采样,直接基于该基模型完成相应的AI任务;若否,则可以执行步骤4至5,以继续采样获得下一个基模型。
步骤4:管理平台100可以根据基模型的性能更新训练数据的权重。
具体地,该基模型在第二类型的训练数据的性能高于在第一类别的训练数据的性能时,管理平台100可以增加训练数据集中第一类别的训练数据的权重,和/或降低训练数据集中第二类别的训练数据的权重。如此,第一类别的训练数据有较高的概率用于训练超网络,第二类别的训练数据有较低的概率用于训练超网络。
其中,管理平台100更新训练数据的权重有多种实现方式,下面以其中两种实现方式进行示例说明。
在第一种实现方式中,管理平台100可以根据线性函数更新训练数据的权重。该线性函数具体是表征训练数据的权重与基模型的性能之间的线性关系的函数。其中,管理平台100还可以对权重进行归一化。例如,管理平台100可以将不同类别的训练数据的权重之和设置为1。
在第二种实现方式中,管理平台100可以利用Adaboost方法更新训练数据的权重。具体参见公式(8):
Figure PCTCN2021142269-appb-000016
其中,E i表征基模型φ i的误差率,β i表征基模型φ i的系数,W i(j)是用于训练当前基模型(例如为基模型φ i)的训练数据x j的权重,W i+1(j)是用于训练下一个基模型(例如为基模型φ i+1)的训练数据x j的权重。Z i是个归一化系数,使得W i(j)能够代表一个分布。h i(·)是基模型φ i的推理结果,y j是样本数据中的标签。
具体地,训练平台102可以获取基模型φ i的误差率E i,例如可以基于基模型φ i的精度确定基模型φ i的误差率。然后训练平台102根据基模型φ i的误差率E i,计算基模型的系数β i。接着,训练平台102根据基模型φ i对样本数据x j的预测结果h i(x j)是否等于样本数据中的标签y j,调整权重。例如,h i(x j)=y j时,训练平台102可以在W i(j)的基础上乘以
Figure PCTCN2021142269-appb-000017
获得更新的权重W i+1(j);h i(x j)≠y j时,训练平台102可以在W i(j)的基础上乘以
Figure PCTCN2021142269-appb-000018
获得更新的权重W i+1(j)。
步骤5:管理平台100采用更新权重后的训练数据训练超网络,从该超网络中采样获得下一个基模型。
更新训练数据的权重后,权重高的训练数据有较高概率被选中用于训练超网络,获得基模型,权重低的训练数据有较低概率被选中用于训练超网络。如此,超网络可以重点基于权重高的训练数据进行训练,在该训练过程中采样得到的基模型在该类型训练数据具有较好的性能。由此,管理平台100从训练超网络过程中,获得的多个基模型可以实现性能互补,基于集成有上述多个基模型的AI集成模型所获得的AI任务的执行结果的精度可以显著提高。
进一步地,管理平台100在利用更新权重后的训练数据训练超网络,以获得下一个基模型时,可以是基于更新权重后的训练数据训练原始的超网络,也可以是基于更新权重后的训练数据,微调(fine tune)超网络。其中,微调是指对预训练模型进行小幅度的调整。具体到本实施例,管理平台100可以基于更新权重后的训练数据,对经过训练的超网络进行再训练,而无需从头开始训练超网络,从而实现超网络的微调,可以降低训练的复杂度。
当基模型的数量大于等于2,且基模型的性能均未达到预设性能时,管理平台100可以基于训练数据集和获得的多个基模型训练初始图网络模型,得到图网络模型。然后管理平台100确定图网络模型的性能是否达到预设性能,若是,则可以停止训练,根据图网络模型和多个基模型构建AI集成模型,若否,则可以继续采样新的基模型,并在新的基模型的性能未达到预设性能时,基于训练数据集和包括新的基模型在内的多个基模型训练得到图网络模型。
图1至图7所示实施例对AI集成模型的构建方法进行了详细介绍,经过上述方法构建的AI集成模型可以用于对输入数据进行推理,以用于AI任务的实现。接下来对AI集成模型的推理方法进行介绍。
AI集成模型的推理方法可以由推理装置执行。其中,推理装置可以是软件装置。该软件装置可以部署在计算设备或计算设备集群中,计算设备集群通过运行该软件装置,从而执行本申请实施例提供的AI集成模型的推理方法。在一些实施例中,推理装置也可以是硬件装置。该硬件装置运行时,执行本申请实施例提供的AI集成模型的推理方法。为了便于理解,下文以该推理装置为软件装置进行示例说明。
参见图8所示的推理装置的结构示意图,该装置800包括通信模块802、第一推理模 块804、构建模块806和第二推理模块808。其中,通信模块802用于获取输入数据,第一推理模块804用于将所述输入数据分别输入每个基模型,获得每个基模型对所述输入数据进行推理后的输出,构建模块806用于将所述多个基模型的输出构建成图结构,第二推理模块808用于将图结构输入至图网络模型,基于图网络模型获得该AI集成模型的推理结果。
在一些可能的实现方式中,如图9所示,推理装置800可以部署在云环境中。如此,推理装置800可以向用户提供推理云服务,以供用户使用。具体地,用户可以通过客户端(例如是浏览器或者是专用客户端)触发启动推理装置800的操作,以在云环境中创建推理装置800的实例。然后,用户通过客户端与推理装置800的实例进行交互,以执行AI集成模型的推理方法。类似地,推理装置800也可以部署在边缘环境中,或者是部署在台式机、笔记本电脑、智能手机等用户终端中。
在另一些可能的实现方式中,推理装置800还可以分布式地部署在不同环境。例如,推理装置800的各个模块可以分布式地部署在云环境、边缘环境和端设备的任意两个环境中,或者部署在上述三个环境中。
接下来,将从推理装置800的角度对本申请实施例提供的AI集成模型的推理方法进行详细说明。
参见图10所示的AI集成模型的推理方法的流程图,该方法包括:
S1002:推理装置800获取输入数据。
具体地,推理装置800中包括AI集成模型。不同的训练数据能够构建出不同的AI集成模型,不同的AI集成模型可以用于完成不同的AI任务。例如,标注有图像的类别的训练数据可以构建出对图像进行分类的AI集成模型,标注有翻译语句的训练数据可以构建出对文本进行翻译的AI集成模型。
推理装置800可以接收用户上传的输入数据,或者是从数据源获取输入数据。根据AI任务不同,推理装置800接收的输入数据可以是不同类型。以AI任务为图像分类任务为例,推理装置800接收的输入数据可以是待分类的图像,该AI任务的目标是对图像进行分类,该AI任务的执行结果可以为图像的类别。
S1004:推理装置800将输入数据分别输入至AI集成模型中的每个基模型,获得每个基模型对输入数据进行推理后的输出。
其中,每个基模型为经过训练后的AI模型。该基模型可以是经过训练的随机森林模型或者决策树模型等,也可以是从超网络中采样得到的神经网络模型。推理装置800将输入数据分别输入每个基模型,每个基模型可以对输入数据进行特征提取,获得每个基模型对输入数据进行推理后的输出。
仍以图像分类任务示例说明,推理装置800将待分类的图像输入至AI集成模型中的每个基模型,获得每个基模型对待分类的图像进行推理后的输出。其中,每个基模型对待分类的图像进行推理后的输出为每个基模型从待分类的图像中提取的特征。
S1006:推理装置800将所述多个基模型的输出构建成图结构。
具体地,推理装置800可以确定多个基模型中每两个基模型的输出之间的相似度。其中,多个基模型的输出可以通过特征表示,因此,每两个基模型的输出之间的相似度可以通过特征之间的距离表征。推理装置800可以以多个基模型中每个基模型的输出为图结构 的节点,根据每两个基模型的输出之间的相似度确定节点之间的边,然后根据上述节点和上述边构建成图结构。
其中,推理装置800可以设置相似度阈值。在一些可能的实现方式中,当两个特征的距离大于该相似度阈值时,则可以确定两个特征对应的节点之间包括边,当两个特征的距离小于或等于该相似度阈值时,则可以确定两个特征对应的节点之间不包括边。在另一些可能的实现方式中,推理装置800也可以设置任意两个节点之间包括边,然后根据特征的距离大小为各自对应的边赋予权重。
S1008:推理装置800将所述图结构输入至所述图网络模型,基于所述图网络模型获得所述AI集成模型的推理结果。
推理装置800将构建的图结构输入至图网络模型,该图网络模型可以对图结构进行处理,例如是通过图卷积网络模型对图结构进行卷积处理,从而获得AI集成模型的推理结果。AI集成模型的推理结果可以是输入数据的特征,该特征具体是图网络模型对多个基模型提取的特征进行融合所得到的融合后的特征。
在图像分类任务的示例中,推理装置800根据各个基模型从待分类的图像中提取的特征构建图结构,然后将该图结构输入至图网络模型,获得AI集成模型的推理结果。该推理结果可以是AI集成模型中的图网络模型对多个基模型提取的特征进行融合所得的融合后的特征。
S1010:推理装置800将AI集成模型的推理结果输入至决策层,将决策层的输出作为AI任务的执行结果。
针对不同AI任务,决策层可以是不同类型。例如,对于分类任务,决策层可以是分类器,对于回归任务,决策层可以是回归器。推理装置800可以输入AI集成模型的推理结果(例如是融合后的特征)至决策层进行决策,将决策层的输出作为AI任务的执行结果。
仍以AI任务为图像分类任务进行示例说明。推理装置800可以将融合后的特征输入至分类器进行分类,获得图像的类别。其中,图像的类别即为分类任务的执行结果。
需要说明的是,AI集成模型也可以用于对输入数据进行预处理,AI集成模型的推理结果作为预处理的结果。推理装置800可以将AI集成模型的推理结果,输入至下游的任务模型。该任务模型是针对特定的AI任务被训练完成的AI模型。推理装置800可以利用任务模型对推理结果进行进一步的特征提取,以及根据进一步特征提取后的特征进行决策,将所述决策获得的结果作为AI任务的执行结果。
在实际应用时,推理装置800还可以向用户呈现AI任务的执行结果,以便于用户根据该执行结果采取相应的措施,或者执行相应的动作。本申请实施例对此不作限定。
基于以上描述,本申请实施例提供了一种AI集成模型的推理方法。该方法中,推理装置800将输入数据输入多个基模型,将多个基模型的输出构建为图结构,然后通过图网络模型对图结构进行处理,以对多个基模型的输出进行融合。由于图网络模型在对图结构进行处理时,会考虑图结构中各节点的邻居节点,因此,图网络模型在对多个基模型的输出进行融合时,充分考虑了各个基模型之间的差异性和相关性,根据由图网络模型和多个基模型构建的AI集成模型所获得的AI任务的执行结果的精度可以显著提高。
上文结合图1至图10对本申请实施例提供的AI集成模型的推理方法进行了详细介绍,下面将结合附图对本申请实施例提供的装置、设备进行介绍。
参见图1所示的AI集成模型的管理平台100的结构示意图,该管理平台100(也即管理系统)包括:
交互单元102,用于获取训练数据集、初始图网络模型和多个基模型,其中,每个基模型为经过训练后的AI模型;
训练单元104,用于利用所述训练数据集中的训练数据和所述多个基模型,迭代训练所述初始图网络模型,获得图网络模型;
构建单元106,用于将所述图网络模型和所述多个基模型构建为所述AI集成模型,其中,所述图网络模型的输入为由所述多个基模型的输出构成的图结构。
在一些可能的实现方式中,所述训练单元104在利用所述训练数据集中的训练数据和所述多个基模型,迭代训练所述初始图网络模型的过程中,每次迭代包括:
将所述训练数据集中的第一训练数据分别输入至每个基模型,获得每个基模型对所述第一训练数据进行推理后的输出;
将所述多个基模型对所述第一训练数据进行推理后的输出构建成图结构;
利用所述图结构训练所述初始图网络模型。
在一些可能的实现方式中,所述多个基模型包括以下类型的AI模型中的一种或多种:决策树模型、随机森林模型和神经网络模型。
在一些可能的实现方式中,所述交互单元102具体用于:
通过训练单元训练超网络,从所述超网络中获得多个基模型。
在一些可能的实现方式中,所述训练单元104具体用于:
利用所述训练数据集中的训练数据训练超网络,获得第i个基模型,所述i为正整数;
根据所述第i个基模型的性能更新所述训练数据集中的训练数据的权重;
利用更新权重后的所述训练数据集中的训练数据训练所述超网络,获得第i+1个基模型。
在一些可能的实现方式中,所述训练单元104具体用于:
当所述第i个基模型在第二类别的训练数据的性能高于在第一类别的训练数据的性能时,增加所述训练数据集中的第一类别的训练数据的权重,和/或降低所述训练数据集中的第二类别的训练数据的权重。
在一些可能的实现方式中,所述训练单元104具体用于:
利用更新权重后的所述训练数据,微调所述超网络。
在一些可能的实现方式中,所述训练单元104具体用于:
确定所述多个基模型中每两个基模型对所述第一训练数据进行推理后的输出之间的相似度;
以所述多个基模型中每个基模型对所述第一训练数据进行推理后的输出为图结构的节点,根据所述相似度确定所述节点之间的边,根据所述节点和所述边获得所述图结构。
在一些可能的实现方式中,所述图网络模型包括图卷积网络模型、图注意力网络模型、图自动编码器模型、图生成网络模型或者图时空网络模型中的任意一种。
在一些可能的实现方式中,所述图卷积网络模型包括由切比雪夫网络化简得到的图卷积网络模型。
根据本申请实施例的管理平台100可对应于执行本申请实施例中描述的方法,并且管理平台100的各个模块/单元的上述和其它操作和/或功能分别为了实现图4所示实施例中的各个方法的相应流程,为了简洁,在此不再赘述。
接着,参见图8所示的AI集成模型的推理装置800的结构示意图,该推理装置800包括:
通信模块802,用于获取输入数据;
第一推理模块804,用于将所述输入数据分别输入所述AI集成模型中的每个基模型,获得每个基模型对所述输入数据进行推理后的输出,其中,所述每个基模型为经过训练后的AI模型;
构建模块806,用于将所述多个基模型的输出构建成图结构;
第二推理模块808,用于将所述图结构输入至所述图网络模型,基于所述图网络模型获得所述AI集成模型的推理结果。
在一些可能的实现方式中,所述构建模块806具体用于:
确定所述多个基模型中每两个基模型的输出之间的相似度;
以所述多个基模型中每个基模型的输出为图结构的节点,根据所述相似度确定所述节点之间的边,根据所述节点和所述边获得所述图结构。
在一些可能的实现方式中,所述AI集成模型的推理结果为所述输入数据的特征。
在一些可能的实现方式中,所述装置800还包括:
执行模块,用于将所述AI集成模型的推理结果输入至决策层,将所述决策层的输出作为AI任务的执行结果。
在一些可能的实现方式中,所述装置800还包括:
执行模块,用于将所述AI集成模型的推理结果输入至任务模型,利用任务模型对所述推理结果进行进一步的特征提取,以及根据进一步特征提取后的特征进行决策,将所述决策获得的结果作为AI任务的执行结果,其中,所述任务模型为针对所述AI任务被训练完成的AI模型。
根据本申请实施例的推理装置800可对应于执行本申请实施例中描述的方法,并且推理装置800的各个模块/单元的上述和其它操作和/或功能分别为了实现图10所示实施例中的各个方法的相应流程,为了简洁,在此不再赘述。
本申请实施例还提供一种计算设备集群。该计算设备集群可以是云环境、边缘环境或者终端设备中的至少一台计算设备形成的计算设备集群。该计算设备集群具体用于实现如图1所示实施例中管理平台100的功能。
图11提供了一种计算设备集群的结构示意图,如图11所示,计算设备集群10包括多台计算设备1100,计算设备1100包括总线1101、处理器1102、通信接口1103和存储器1104。处理器1102、存储器1104和通信接口1103之间通过总线1101通信。
总线1101可以是外设部件互连标准(peripheral component interconnect,PCI)总线或扩展工业标准结构(extended industry standard architecture,EISA)总线等。总线可以分为地址总线、数据总线、控制总线等。为便于表示,图11中仅用一条粗线表示,但并不表示仅有一根总线或一种类型的总线。
处理器1102可以为中央处理器(central processing unit,CPU)、图形处理器(graphics processing unit,GPU)、微处理器(micro processor,MP)或者数字信号处理器(digital signal processor,DSP)等处理器中的任意一种或多种。
通信接口1103用于与外部通信。例如,通信接口1103可以用于获取训练数据集、初始图网络模型和多个基模型,或者通信接口1103用于输出基于多个基模型构建的AI集成模型。等等。
存储器1104可以包括易失性存储器(volatile memory),例如随机存取存储器(random access memory,RAM)。存储器1104还可以包括非易失性存储器(non-volatile memory),例如只读存储器(read-only memory,ROM),快闪存储器,硬盘驱动器(hard disk drive,HDD)或固态驱动器(solid state drive,SSD)。
存储器1104中存储有可执行代码,处理器1102执行该可执行代码以执行前述构建AI集成模型的方法。
具体地,在实现图1所示实施例的情况下,且图1实施例中所描述的管理平台100的各部分如交互单元102、训练单元104、构建单元106的功能为通过软件实现的情况下,执行图1中功能所需的软件或程序代码可以存储在计算设备集群10中的至少一个存储器1104中。至少一个处理器1102执行存储器1104中存储的程序代码,以使得计算设备集群1100执行前述构建AI集成模型的方法。
图12提供了一种计算设备集群的结构示意图,如图12所示,计算设备集群20包括多台计算设备1200,计算设备1200包括总线1201、处理器1202、通信接口1203和存储器1204。处理器1202、存储器1204和通信接口1203之间通过总线1201通信。
其中,总线1201、处理器1202、通信接口1203和存储器1204的具体实现可以参见图11相关内容描述。计算设备集群20中的至少一个存储器1204中存储有可执行代码,至少一个处理器1202执行该可执行代码以执行前述AI集成模型的推理方法。
本申请实施例还提供了一种计算机可读存储介质。所述计算机可读存储介质可以是计算设备能够存储的任何可用介质或者是包含一个或多个可用介质的数据中心等数据存储设备。所述可用介质可以是磁性介质,(例如,软盘、硬盘、磁带)、光介质(例如,DVD)、或者半导体介质(例如固态硬盘)等。该计算机可读存储介质包括指令,所述指令指示计算设备执行上述应用于管理平台100的构建AI集成模型的方法,或者指示计算设备执行上述应用于推理装置800的推理方法。
本申请实施例还提供了一种计算机程序产品。所述计算机程序产品包括一个或多个计算机指令。在计算设备上加载和执行所述计算机指令时,全部或部分地产生按照本申请实施例所述的流程或功能。
所述计算机指令可以存储在计算机可读存储介质中,或者从一个计算机可读存储介质向另一计算机可读存储介质传输,例如,所述计算机指令可以从一个网站站点、计算机或 数据中心通过有线(例如同轴电缆、光纤、数字用户线(DSL))或无线(例如红外、无线、微波等)方式向另一个网站站点、计算机或数据中心进行传输。
所述计算机程序产品可以为一个软件安装包,在需要使用前述构建AI集成模型的方法或AI集成模型的推理方法的任一方法的情况下,可以下载该计算机程序产品并在计算设备上执行该计算机程序产品。
上述各个附图对应的流程或结构的描述各有侧重,某个流程或结构中没有详述的部分,可以参见其他流程或结构的相关描述。

Claims (34)

  1. 一种构建人工智能AI集成模型的方法,其特征在于,包括:
    获取训练数据集、初始图网络模型和多个基模型,其中,每个基模型为经过训练后的AI模型;
    利用所述训练数据集中的训练数据和所述多个基模型,迭代训练所述初始图网络模型,获得图网络模型;
    将所述图网络模型和所述多个基模型构建为所述AI集成模型,其中,所述图网络模型的输入为由所述多个基模型的输出构成的图结构。
  2. 根据权利要求1所述的方法,其特征在于,在利用所述训练数据集中的训练数据和所述多个基模型,迭代训练所述初始图网络模型的过程中,每次迭代包括:
    将所述训练数据集中的第一训练数据分别输入至每个基模型,获得每个基模型对所述第一训练数据进行推理后的输出;
    将所述多个基模型对所述第一训练数据进行推理后的输出构建成图结构;
    利用所述图结构训练所述初始图网络模型。
  3. 根据权利要求1或2所述的方法,其特征在于,所述多个基模型包括以下类型的AI模型中的一种或多种:决策树模型、随机森林模型和神经网络模型。
  4. 根据权利要求1或2所述的方法,其特征在于,所述获得多个基模型包括:
    训练超网络,从所述超网络中获得多个基模型。
  5. 根据权利要求4所述的方法,其特征在于,所述训练超网络,从所述超网络中获得多个基模型,包括:
    利用所述训练数据集中的训练数据训练超网络,获得第i个基模型,所述i为正整数;
    根据所述第i个基模型的性能更新所述训练数据集中的训练数据的权重;
    利用更新权重后的所述训练数据集中的训练数据训练所述超网络,获得第i+1个基模型。
  6. 根据权利要求5所述的方法,其特征在于,所述根据所述第i个基模型的性能更新所述训练数据集中的训练数据的权重,包括:
    当所述第i个基模型在第二类别的训练数据的性能高于在第一类别的训练数据的性能时,增加所述训练数据集中的第一类别的训练数据的权重,和/或降低所述训练数据集中的第二类别的训练数据的权重。
  7. 根据权利要求5所述的方法,其特征在于,所述利用更新权重后的所述训练数据训练所述超网络,包括:
    利用更新权重后的所述训练数据,微调所述超网络。
  8. 根据权利要求2所述的方法,其特征在于,所述将所述多个基模型对所述第一训练数据进行推理后的输出构建成图结构,包括:
    确定所述多个基模型中每两个基模型对所述第一训练数据进行推理后的输出之间的相似度;
    以所述多个基模型中每个基模型对所述第一训练数据进行推理后的输出为图结构的节点,根据所述相似度确定所述节点之间的边,根据所述节点和所述边获得所述图结构。
  9. 根据权利要求1或2所述的方法,其特征在于,所述图网络模型包括以下模型中的任意一种:图卷积网络模型、图注意力网络模型、图自动编码器模型、图生成网络模型或者图时空网络模型。
  10. 根据权利要求9所述的方法,其特征在于,当所述图网络模型为图卷积网络模型时,所述图卷积网络模型为由切比雪夫网络化简得到的图卷积网络模型。
  11. 一种人工智能AI集成模型的推理方法,其特征在于,所述方法应用于推理装置,所述AI集成模型包括图网络模型和多个基模型,包括:
    获取输入数据;
    将所述输入数据分别输入所述AI集成模型中的每个基模型,获得每个基模型对所述输入数据进行推理后的输出,其中,所述每个基模型为经过训练后的AI模型;
    将所述多个基模型的输出构建成图结构;
    将所述图结构输入至所述图网络模型,基于所述图网络模型获得所述AI集成模型的推理结果。
  12. 根据权利要求11所述的方法,其特征在于,所述将所述多个基模型的输出构建成图结构,包括:
    确定所述多个基模型中每两个基模型的输出之间的相似度;
    以所述多个基模型中每个基模型的输出为图结构的节点,根据所述相似度确定所述节点之间的边,根据所述节点和所述边获得所述图结构。
  13. 根据权利要求11或12所述的方法,其特征在于,所述AI集成模型的推理结果为所述输入数据的特征。
  14. 根据权利要求13所述的方法,其特征在于,所述方法还包括:
    将所述AI集成模型的推理结果输入至决策层,将所述决策层的输出作为AI任务的执行结果。
  15. 根据权利要求13所述的方法,其特征在于,所述方法还包括:
    将所述AI集成模型的推理结果输入至任务模型,利用任务模型对所述推理结果进行进一步的特征提取,以及根据进一步特征提取后的特征进行决策,将所述决策获得的结果作为AI任务的执行结果,其中,所述任务模型为针对所述AI任务被训练完成的AI模型。
  16. 一种人工智能AI集成模型的管理系统,其特征在于,所述平台包括:
    交互单元,用于获取训练数据集、初始图网络模型和多个基模型,其中,每个基模型为经过训练后的AI模型;
    训练单元,用于利用所述训练数据集中的训练数据和所述多个基模型,迭代训练所述初始图网络模型,获得图网络模型;
    构建单元,用于将所述图网络模型和所述多个基模型构建为所述AI集成模型,其中,所述图网络模型的输入为由所述多个基模型的输出构成的图结构。
  17. 根据权利要求16所述的系统,其特征在于,所述训练单元在利用所述训练数据集中的训练数据和所述多个基模型,迭代训练所述初始图网络模型的过程中,每次迭代包括:
    将所述训练数据集中的第一训练数据分别输入至每个基模型,获得每个基模型对所述第一训练数据进行推理后的输出;
    将所述多个基模型对所述第一训练数据进行推理后的输出构建成图结构;
    利用所述图结构训练所述初始图网络模型。
  18. 根据权利要求16或17所述的系统,其特征在于,所述多个基模型包括以下类型的AI模型中的一种或多种:决策树模型、随机森林模型和神经网络模型。
  19. 根据权利要求16或17所述的系统,其特征在于,所述交互单元具体用于:
    通过训练单元训练超网络,从所述超网络中获得多个基模型。
  20. 根据权利要求19所述的系统,其特征在于,所述训练单元具体用于:
    利用所述训练数据集中的训练数据训练超网络,获得第i个基模型,所述i为正整数;
    根据所述第i个基模型的性能更新所述训练数据集中的训练数据的权重;
    利用更新权重后的所述训练数据集中的训练数据训练所述超网络,获得第i+1个基模型。
  21. 根据权利要求20所述的系统,其特征在于,所述训练单元具体用于:
    当所述第i个基模型在第二类别的训练数据的性能高于在第一类别的训练数据的性能时,增加所述训练数据集中的第一类别的训练数据的权重,和/或降低所述训练数据集中的第二类别的训练数据的权重。
  22. 根据权利要求20所述的系统,其特征在于,所述训练单元具体用于:
    利用更新权重后的所述训练数据,微调所述超网络。
  23. 根据权利要求17所述的系统,其特征在于,所述训练单元具体用于:
    确定所述多个基模型中每两个基模型对所述第一训练数据进行推理后的输出之间的相似度;
    以所述多个基模型中每个基模型对所述第一训练数据进行推理后的输出为图结构的节点,根据所述相似度确定所述节点之间的边,根据所述节点和所述边获得所述图结构。
  24. 根据权利要求16或17所述的系统,其特征在于,所述图网络模型包括以下模型中的任意一种:图卷积网络模型、图注意力网络模型、图自动编码器模型、图生成网络模型或者图时空网络模型。
  25. 根据权利要求24所述的系统,其特征在于,当所述图网络模型为图卷积网络模型时,所述图卷积网络模型为由切比雪夫网络化简得到的图卷积网络模型。
  26. 一种人工智能AI集成模型的推理装置,其特征在于,所述AI集成模型包括图网络模型和多个基模型,所述装置包括:
    通信模块,用于获取输入数据;
    第一推理模块,用于将所述输入数据分别输入所述AI集成模型中的每个基模型,获得每个基模型对所述输入数据进行推理后的输出,其中,所述每个基模型为经过训练后的AI模型;
    构建模块,用于将所述多个基模型的输出构建成图结构;
    第二推理模块,用于将所述图结构输入至所述图网络模型,基于所述图网络模型获得所述AI集成模型的推理结果。
  27. 根据权利要求26所述的装置,其特征在于,所述构建模块具体用于:
    确定所述多个基模型中每两个基模型的输出之间的相似度;
    以所述多个基模型中每个基模型的输出为图结构的节点,根据所述相似度确定所述节点之间的边,根据所述节点和所述边获得所述图结构。
  28. 根据权利要求26或27所述的装置,其特征在于,所述AI集成模型的推理结果为所述输入数据的特征。
  29. 根据权利要求28所述的装置,其特征在于,所述装置还包括:
    执行模块,用于将所述AI集成模型的推理结果输入至决策层,将所述决策层的输出作为AI任务的执行结果。
  30. 根据权利要求28所述的装置,其特征在于,所述装置还包括:
    执行模块,用于将所述AI集成模型的推理结果输入至任务模型,利用任务模型对所述推理结果进行进一步的特征提取,以及根据进一步特征提取后的特征进行决策,将所述决策获得的结果作为AI任务的执行结果,其中,所述任务模型为针对所述AI任务被训练完成的AI模型。
  31. 一种计算设备集群,其特征在于,所述计算设备集群包括至少一台计算设备,所述至少一台计算设备包括至少一个处理器和至少一个存储器,所述至少一个存储器中存储有计算机可读指令,所述至少一个处理器读取并执行所述计算机可读指令,使得所述计算设备集群执行如权利要求1至10任一项所述的方法。
  32. 一种计算设备集群,其特征在于,所述计算设备集群包括至少一台计算设备,所述至少一台计算设备包括至少一个处理器和至少一个存储器,所述至少一个存储器中存储有计算机可读指令,所述至少一个处理器执行所述计算机可读指令,使得所述计算设备集群执行如权利要求11至15任一项所述的方法。
  33. 一种计算机可读存储介质,其特征在于,包括计算机可读指令,当所述计算机可读指令在计算设备或计算设备集群上运行时,使得所述计算设备或计算设备集群执行如权利要求1至15任一项所述的方法。
  34. 一种计算机程序产品,其特征在于,包括计算机可读指令,当所述计算机可读指令在计算设备或计算设备集群上运行时,使得所述计算设备或计算设备集群执行如权利要求1至15任一项所述的方法。
PCT/CN2021/142269 2021-05-31 2021-12-29 构建ai集成模型的方法、ai集成模型的推理方法及装置 WO2022252596A1 (zh)

Priority Applications (2)

Application Number Priority Date Filing Date Title
EP21943948.6A EP4339832A1 (en) 2021-05-31 2021-12-29 Method for constructing ai integrated model, and inference method and apparatus of ai integrated model
US18/524,875 US20240119266A1 (en) 2021-05-31 2023-11-30 Method for Constructing AI Integrated Model, and AI Integrated Model Inference Method and Apparatus

Applications Claiming Priority (4)

Application Number Priority Date Filing Date Title
CN202110602479 2021-05-31
CN202110602479.6 2021-05-31
CN202110977566.X 2021-08-24
CN202110977566.XA CN115964632A (zh) 2021-05-31 2021-08-24 构建ai集成模型的方法、ai集成模型的推理方法及装置

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US18/524,875 Continuation US20240119266A1 (en) 2021-05-31 2023-11-30 Method for Constructing AI Integrated Model, and AI Integrated Model Inference Method and Apparatus

Publications (1)

Publication Number Publication Date
WO2022252596A1 true WO2022252596A1 (zh) 2022-12-08

Family

ID=84322825

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/142269 WO2022252596A1 (zh) 2021-05-31 2021-12-29 构建ai集成模型的方法、ai集成模型的推理方法及装置

Country Status (3)

Country Link
US (1) US20240119266A1 (zh)
EP (1) EP4339832A1 (zh)
WO (1) WO2022252596A1 (zh)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116958149A (zh) * 2023-09-21 2023-10-27 湖南红普创新科技发展有限公司 医疗模型训练方法、医疗数据分析方法、装置及相关设备

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20120084235A1 (en) * 2010-09-30 2012-04-05 Massachusetts Institute Of Technology Structured prediction model learning apparatus, method, program, and recording medium
CN109614777A (zh) * 2018-11-23 2019-04-12 第四范式(北京)技术有限公司 智能设备以及智能设备的用户身份验证方法和装置
CN111459168A (zh) * 2020-04-23 2020-07-28 上海交通大学 一种融合的自动驾驶汽车过街行人轨迹预测方法及系统
CN111738414A (zh) * 2020-06-11 2020-10-02 北京百度网讯科技有限公司 推荐模型的生成、内容推荐方法、装置、设备和介质
CN112163620A (zh) * 2020-09-27 2021-01-01 昆明理工大学 一种stacking模型融合方法

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20120084235A1 (en) * 2010-09-30 2012-04-05 Massachusetts Institute Of Technology Structured prediction model learning apparatus, method, program, and recording medium
CN109614777A (zh) * 2018-11-23 2019-04-12 第四范式(北京)技术有限公司 智能设备以及智能设备的用户身份验证方法和装置
CN111459168A (zh) * 2020-04-23 2020-07-28 上海交通大学 一种融合的自动驾驶汽车过街行人轨迹预测方法及系统
CN111738414A (zh) * 2020-06-11 2020-10-02 北京百度网讯科技有限公司 推荐模型的生成、内容推荐方法、装置、设备和介质
CN112163620A (zh) * 2020-09-27 2021-01-01 昆明理工大学 一种stacking模型融合方法

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116958149A (zh) * 2023-09-21 2023-10-27 湖南红普创新科技发展有限公司 医疗模型训练方法、医疗数据分析方法、装置及相关设备
CN116958149B (zh) * 2023-09-21 2024-01-12 湖南红普创新科技发展有限公司 医疗模型训练方法、医疗数据分析方法、装置及相关设备

Also Published As

Publication number Publication date
EP4339832A1 (en) 2024-03-20
US20240119266A1 (en) 2024-04-11

Similar Documents

Publication Publication Date Title
JP7322044B2 (ja) レコメンダシステムのための高効率畳み込みネットワーク
US20230186094A1 (en) Probabilistic neural network architecture generation
CN110263227B (zh) 基于图神经网络的团伙发现方法和系统
CN109783582B (zh) 一种知识库对齐方法、装置、计算机设备及存储介质
US20220035878A1 (en) Framework for optimization of machine learning architectures
US9807473B2 (en) Jointly modeling embedding and translation to bridge video and language
CN112765477B (zh) 信息处理、信息推荐的方法和装置、电子设备和存储介质
CN111667022A (zh) 用户数据处理方法、装置、计算机设备和存储介质
CN116261731A (zh) 基于多跳注意力图神经网络的关系学习方法与系统
WO2022156561A1 (zh) 一种自然语言处理方法以及装置
Arsov et al. Network embedding: An overview
US11640529B2 (en) Training a neural network to create an embedding for an unlabeled vertex in a hypergraph
WO2022166115A1 (en) Recommendation system with adaptive thresholds for neighborhood selection
KR102465571B1 (ko) 문서 데이터의 주제어 분류를 수행하는 기법
WO2021042857A1 (zh) 图像分割模型的处理方法和处理装置
US20240119266A1 (en) Method for Constructing AI Integrated Model, and AI Integrated Model Inference Method and Apparatus
WO2022227217A1 (zh) 文本分类模型的训练方法、装置、设备及可读存储介质
JP2022078310A (ja) 画像分類モデル生成方法、装置、電子機器、記憶媒体、コンピュータプログラム、路側装置およびクラウド制御プラットフォーム
WO2023020214A1 (zh) 检索模型的训练和检索方法、装置、设备及介质
US20220101063A1 (en) Method and apparatus for analyzing neural network performance
WO2024040941A1 (zh) 神经网络结构搜索方法、装置及存储介质
GB2572164A (en) Artificial neural networks
US20230259761A1 (en) Transfer learning system and method for deep neural network
EP4339843A1 (en) Neural network optimization method and apparatus
CN111914083A (zh) 语句处理方法、装置及存储介质

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21943948

Country of ref document: EP

Kind code of ref document: A1

WWE Wipo information: entry into national phase

Ref document number: 2021943948

Country of ref document: EP

ENP Entry into the national phase

Ref document number: 2021943948

Country of ref document: EP

Effective date: 20231214

NENP Non-entry into the national phase

Ref country code: DE