CN113050955A - Self-adaptive AI model deployment method - Google Patents

Self-adaptive AI model deployment method Download PDF

Info

Publication number
CN113050955A
CN113050955A CN201911363266.1A CN201911363266A CN113050955A CN 113050955 A CN113050955 A CN 113050955A CN 201911363266 A CN201911363266 A CN 201911363266A CN 113050955 A CN113050955 A CN 113050955A
Authority
CN
China
Prior art keywords
model
deployment
deployment state
sample
dynamic
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201911363266.1A
Other languages
Chinese (zh)
Inventor
周胜平
林俊杰
吴栋
仲景武
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Alpha Cloud Computing Shenzhen Co ltd
Original Assignee
Alpha Cloud Computing Shenzhen Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Alpha Cloud Computing Shenzhen Co ltd filed Critical Alpha Cloud Computing Shenzhen Co ltd
Priority to CN201911363266.1A priority Critical patent/CN113050955A/en
Publication of CN113050955A publication Critical patent/CN113050955A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F8/00Arrangements for software engineering
    • G06F8/60Software deployment
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/10Protocols in which an application is distributed across nodes in the network

Landscapes

  • Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Theoretical Computer Science (AREA)
  • General Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Computation (AREA)
  • Medical Informatics (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Stored Programmes (AREA)

Abstract

The invention discloses a self-adaptive AI model deployment method, which comprises the following steps: receiving call of a business application to an AI (interface), wherein the AI called by the business application corresponds to a specific AI model, and deciding the deployment state of the AI model calling corresponding to the model, wherein the decision on the deployment state of the AI model comprises the use of a pre-deployment state of the AI model and the selection of an AI function model. The implementation of the method for deploying the decision and selecting the functional model for the AI model can effectively improve the flexibility of the AI model to the service application and improve the usability and usability of the AI model, thereby further improving the application depth and breadth of the AI in the service application scene.

Description

Self-adaptive AI model deployment method
Technical Field
The application relates to an artificial intelligence method, in particular to a method for realizing model deployment and operation mechanism of an artificial intelligence model between a cloud end for providing service and a terminal for operating business application.
Background
It is currently widely recognized that Artificial Intelligence (AI-Artificial Intelligence) will be one of the most influential technologies in the twenty-first century and beyond.
Based on this judgment, people invest a lot of resources in the AI field. On the one hand, AI is expected and attempted to solve problems in many areas, for which data is collected on a large scale. On the other hand, to better solve the "target" problem, people have customized different underlying hardware for the field where the problem is located. Through recent years of evolution, AI has developed into a huge technology group and a complex technology ecology.
Typically, the use of AI follows a two-phase pattern: a model training phase and a model deployment (application) phase. Obtaining an AI model as a training result; in the stage of model deployment (application), the business application calls an AI model packaged according to a black box in an API mode, inputs the input parameters and obtains a return result. Because of the customized characteristics of the AI hardware and environment, the training of the AI model is not based on a very computationally intensive server or cloud environment; and because of the customization characteristic, the deployment of the AI model is mostly carried out aiming at a server or a cloud environment. Therefore, the AI technique implemented in this way is very inflexible.
Disclosure of Invention
Accordingly, the present invention discloses a method for solving the above problems and enhancing AI deployment flexibility. The methods are applied to unspecified mobile terminals, network equipment and even cloud servers; further, these unspecified devices constitute a system that implements adaptive AI model deployment. Therefore, the invention comprises the following steps:
in one aspect, a method for adaptive AI model deployment is provided, including:
receiving call of a business application to an AI (interface), wherein the call to the AI comprises data assignment of parameters serving as the AI, and the AI called by the business application corresponds to an AI model; deciding a deployment state of the AI interface calling corresponding model, wherein the deployment state is the deployment state of the AI model corresponding to the called AI interface, and the decision on the deployment state of the AI model comprises the use of a pre-deployment state of the AI model; selecting a functional model corresponding to the AI model, the functional model being a plurality of differential implementations corresponding to the same AI model, and the choose selection for the AI functional model being a selection from the plurality of differential implementations corresponding to the AI model.
Artificial intelligence is typically divided into two phases in application implementation. The first stage is model training and the second stage is deployment of the model. Without loss of generality, the trained AI model is typically packaged into an SDK and published in an API interface. Thus, after learning the interface (API) published with the AI model, the relevant business application calls this interface as needed. After receiving the call of the relevant service application to the AI interface, the SDK including the called AI model information further responds to the AI interface call, that is, processes the deployment situation of the AI model, which is the adaptive AI model deployment method and the flow thereof implemented by the present invention. The self-adaptive deployment method is detailed as follows: after receiving a call from the service application to the AI interface, optionally, the AI SDK analyzes the assigned data of the parameter. These data include both the data to be processed of the AI model and the indicative data for the model provided by the business application through the interface. The AI SDK acquires the pre-deployment state of the AI model, acquires the end side to which the AI model is issued according to the pre-deployment state, and acquires the dynamic information of the end side, wherein the dynamic information mainly reflects the available state of the end side. The AI SDK analyzes and decides the deployment state of the AI model from these dynamic information and optionally from these assigned data, and thus determines the side of the model execution that is responding to the AI interface call. If the fact that the AI model which is issued to the local service application needs to be called is determined based on the corresponding data and information, the deployment state of the corresponding AI model is local; and if the fact that the AI model issued to the non-business application local (remote or cloud) needs to be called is determined, the deployment state of the corresponding AI model is non-local. On the basis, an executive body of the function model group corresponding to the AI model needing to be called is further selected. The selection of the functional model group is used to select the best matching executable from a group of executors provided by AI models having the same capabilities. The best matching executor satisfies the following conditions that result availability of the AI model call needs to be ensured on one hand, and environment availability in the AI model call process needs to be ensured on the other hand. This environment availability is measured from the environment in which the model deployment is located, i.e. the invocation of the AI model is performed without significantly deteriorating the environment in which the model deployment is located, at least not in the positive order of the existing competitive severity of the resources, but preferably with the best possible priority to the utilization of the least competing resources in case the AI invocation requirements (resulting availability) are fulfilled. The selection step of the optimal executive body comprises the following steps: acquiring dynamic information of each resource at the corresponding end side of the model deployment state; and determining the selection result of the functional model group according to the requirement condition of the corresponding execution of each functional model group on the resources. The dynamic information of the resource includes: resource type, total amount of resources, available resource amount and available resource proportion, etc.; the resources include, but are not limited to: computing power of computing devices such as a CPU/GPU/TPU, DDR or other Memory provided by a current mainstream or future mainstream storage device, network channels provided by devices such as a network card or an optical fiber interface, and the like.
Thus, the product and service system comprising part or all of the method and steps is implemented, and the function of self-adaptive AI model deployment is achieved, so that great flexibility is provided for the deployment and application of the AI model.
In another aspect, an apparatus for adaptive AI model deployment is provided, where the deployment apparatus includes a service application system, and the service application system further includes:
a service application unit: the application layer code for realizing the business logic is referred, and only from the viewpoint of explaining the relevance of the invention, the business application unit can send out an interface (AI API for short) call aiming at an AI model;
AI-SDK unit: the AI-SDK unit is used for exposing AI APIs which can be called by the service application and the following functional modules;
a model management module: the system comprises an AI model management module, an AI SDK module and a pre-deployment state information module, wherein the AI model management module is used for managing each AI model issued by the AI SDK and recording the model issuing result;
a local model module: the system comprises a module for maintaining a model issued to the local and a corresponding interface;
remote model agent module: the model is used for making a calling agent for a model which is called by a local sending interface and has a decision and selection result of being issued to a non-local model;
the optimization feedback module: after the model calling effect is given, the model calling effect is used for collecting dynamic samples taking optimization as a target and collecting other information for evaluation and optimization;
a model strategy module: the selection of the AI model deployment state and the functional model group is decided, and the selection comprises a dynamic information collection function for collecting the dynamic information of the relevant end side;
the units and modules provided by the invention together realize a device with self-adaptive AI model deployment with other units, modules and software functions required by the actual implementation of the product. The expression is as follows: on a device with flexibility of AI model use, the application can use various AI models, also use diversified interfaces of the same AI model, and even can provide adaptive diversified realization for the service application aiming at the same AI model or even the same AI API. This flexibility is further realized in that the business application can initiate calls to the corresponding interfaces of the AI model published to the non-local-end side, which are forwarded by the proxy of the remote model to the publishing and execution device of the model. Thus, the device to which the business application is applied has the ability to use a diversified AI model. After receiving the call of the relevant service application to the AI interface, the SDK including the called AI model information further responds to the interface call, i.e., processes the deployment situation of the AI model, which is the method and process for adaptive AI model deployment implemented by the present invention. The self-adaptive deployment method is detailed as follows: after receiving a call from the service application to the AI interface, optionally, the AI SDK analyzes the assigned data of the parameter. These data include both the data to be processed of the AI model and the indicative data for the model provided by the business application through the interface. The AI SDK acquires the pre-deployment state of the AI model, acquires the end side to which the AI model is issued according to the pre-deployment state, and acquires the dynamic information of the end side, wherein the dynamic information mainly reflects the available state of the end side. The AI SDK analyzes and decides the deployment state of the AI model from these dynamic information and optionally from these assigned data, and thus determines the side of the model execution that is responding to the AI interface call. If the fact that the AI model which is issued to the local service application needs to be called is determined based on the corresponding data and information, the deployment state of the corresponding AI model is local; and if the fact that the AI model issued to the non-business application local (remote or cloud) needs to be called is determined, the deployment state of the corresponding AI model is non-local. On the basis, an executive body of the function model group corresponding to the AI model needing to be called is further selected. The selection of the functional model group is used to select the best matching executable from a group of executors provided by AI models having the same capabilities. The best matching executor satisfies the following conditions that result availability of the AI model call needs to be ensured on one hand, and environment availability in the AI model call process needs to be ensured on the other hand. This environment availability is measured from the environment in which the model deployment is located, i.e. the invocation of the AI model is performed without significantly deteriorating the environment in which the model deployment is located, at least not in the positive order of the existing competitive severity of the resources, but preferably with the best possible priority to the utilization of the least competing resources in case the AI invocation requirements (resulting availability) are fulfilled. The selection step of the optimal executive body comprises the following steps: acquiring dynamic information of each resource at the corresponding end side of the model deployment state; and determining the selection result of the functional model group according to the requirement condition of the corresponding execution of each functional model group on the resources. The dynamic information of the resource includes: resource type, total amount of resources, available resource amount and available resource proportion, etc.; the resources include, but are not limited to: computing power of computing devices such as a CPU/GPU/TPU, DDR or other Memory provided by a current mainstream or future mainstream storage device, network channels provided by devices such as a network card or an optical fiber interface, and the like.
In another aspect, a method for AI model optimization is provided, including:
generating a second sample, wherein the second sample is different from the first sample, and the generation of the second sample is a processing and converting process from the dynamic feedback information to the second sample; retraining a model, the samples used for retraining comprising the second sample and a first sample distinct from the second sample; and generating a third sample, wherein the third sample is a conversion result of the second sample and the first sample, the generation process of the third sample further comprises adjusting the weight of the third sample, the generation process of the third sample further comprises replacing the first sample with the third sample, and the generation process of the third sample further comprises evaluating the newly trained model and determining whether to generate the third sample according to the evaluation result.
In the automatic adjusting and optimizing process of the AI model: firstly, receiving dynamic feedback, wherein the dynamic feedback is information collected in a model deployment process, and the dynamic information collected in the deployment process comprises but is not limited to difference information reflecting a model deployment result and a model expectation, but also comprises but is not limited to environmental under-voltage and over-voltage information of model deployment, and characteristic information including but not limited to input parameter assignment data input when a business application calls an external interface of a model, and the like; generating a dynamic sample, wherein the generation process of the dynamic sample is a process of generating the dynamic sample from the obtained dynamic feedback information; decision retraining, wherein the decision retraining is to decide whether to retrain a sample collection, the sample collection comprises dynamic samples and static samples, and the static samples are samples used for obtaining the last training of the AI model; the decision of whether to retrain the sample set includes using the weights of the dynamic samples and the weights of the static samples (for example, in simple comparison, when the weights of the dynamic samples are not less than 5% of the weights of the static samples, a decision is made to retrain the samples and generate the model again); retraining samples, the samples used for retraining comprising dynamic samples and static samples; and adjusting the weight of the static sample according to the newly generated model, and further combining to generate a new static sample. By the AI model optimization method, the AI model has the automatic optimization capability, thereby serving the service application and providing better application target and experience effect.
In another aspect, an apparatus for automatically tuning an AI model is provided, which further includes:
a model management module: the model management system is used for managing and maintaining the AI model, and comprises the steps of updating and replacing the AI model obtained through sample training;
a sample management module: the method is used for managing and maintaining samples used for training to obtain the AI model, and comprises the steps of updating the replacement samples and the corresponding sample weights;
a feedback collection module: the system is used for receiving dynamic information sent by a dynamic function, and comprises information used for measuring the effect of the model, parameter-added value data used for calling an AI interface and the like;
an optimization decision module: the method is used for making a decision on whether retraining and model reconstruction are needed or not, and comprises the steps of comparing the relation between the dynamic sample weight and the static sample weight;
the automatic adjusting and optimizing device of the AI model is used for optimizing the AI model according to the deployment result of the model, so that application data in the actual production and service process is used as a training sample of the AI model to maintain the effectiveness of the sample. The automatic adjustment and optimization process of the device to the AI model is as follows: firstly, receiving dynamic feedback, wherein the dynamic feedback is information collected in a model deployment process, and the dynamic information collected in the deployment process comprises but is not limited to difference information reflecting a model deployment result and a model expectation, but also comprises but is not limited to environmental under-voltage and over-voltage information of model deployment, and characteristic information including but not limited to input parameter assignment data input when a business application calls an external interface of a model, and the like; generating a dynamic sample, wherein the generation process of the dynamic sample is a process of generating the dynamic sample from the obtained dynamic feedback information; decision retraining, wherein the decision retraining is to decide whether to retrain a sample collection, the sample collection comprises dynamic samples and static samples, and the static samples are samples used for obtaining the last training of the AI model; the decision of whether to retrain the sample set includes using the weights of the dynamic samples and the weights of the static samples (for example, in simple comparison, when the weights of the dynamic samples are not less than 5% of the weights of the static samples, a decision is made to retrain the samples and generate the model again); retraining samples, the samples used for retraining comprising dynamic samples and static samples; and adjusting the weight of the static sample according to the newly generated model, and further combining to generate a new static sample.
On the other hand, a computer-readable storage medium is proposed, which stores program instructions that, when executed by a processor, have implementation procedures (respectively) for performing the above-described method.
In another aspect, an apparatus for management is presented that includes a storage component, a processing component, and a communication component, the storage component, the processing component, and the communication component being interconnected. The storage component is used for storing data processing codes, and the communication component is used for carrying out information interaction with external equipment; the processing component is configured to invoke program code, each to perform the functions described above with respect to the apparatus.
Drawings
In order to more clearly illustrate the technical solution of the present invention and to more clearly illustrate the elements, modes and processes for achieving the objects of the present invention, the drawings used in the implementation of the present invention will be described below.
FIG. 1 is one of the component diagrams of the adaptive AI model deployment method of the present invention;
FIG. 2 is one of the component diagrams of the adaptive AI model deployment method of the present invention;
FIG. 3 is one of the component diagrams of the adaptive AI model deployment method of the present invention;
FIG. 4 is one of the component diagrams of the adaptive AI model deployment method of the present invention;
FIG. 5 is one of the component diagrams of the adaptive AI model deployment method of the present invention;
FIG. 6 is one of the component diagrams of the adaptive AI model deployment method of the present invention;
FIG. 7 is one of the flow charts for the adaptive AI model deployment method of the present invention;
FIG. 8 is one of the flow charts for the adaptive AI model deployment method of the present invention;
FIG. 9 is one of the flow charts for the adaptive AI model deployment method of the present invention;
Detailed Description
The embodiments of the present invention will be described below with reference to the drawings.
The terms "first," "second," and "third," etc. in the description and claims of this application and in the accompanying drawings are used for distinguishing between different objects and not for describing a particular order. Furthermore, "include" and "have" and any variations thereof, are intended to cover non-exclusive inclusions. For example, a process, method, system, article, or apparatus that comprises a list of steps or elements is not limited to only those steps or elements listed, but may alternatively include other steps or elements not listed, or inherent to such process, method, article, or apparatus.
Reference herein to "an embodiment" means that a particular feature, structure, or characteristic described in connection with the embodiment can be included in at least one embodiment of the application. The appearances of the phrase in various places in the specification are not necessarily all referring to the same embodiment, nor are separate or alternative embodiments mutually exclusive of other embodiments. It is explicitly and implicitly understood by one skilled in the art that the embodiments described herein can be combined with other embodiments.
As used in this application, the terms "server," "unit," "component," "module," "system," and the like are intended to refer to a computer-related entity, either hardware, firmware, a combination of hardware and software, or software in execution. For example, a server may be, but is not limited to, a processor, a data processing platform, a computing device, a computer, two or more computers, or the like; a unit may be, but is not limited to being, a process running on a processor, a runnable object, an executable, a thread of execution, or any other executable computer program. One or more units may reside within a process and/or thread of execution and a unit may be localized on one computer and/or distributed between 2 or more computers. In addition, these units may execute from various computer readable media having various data structures stored thereon. The elements may communicate by way of local and/or remote processes based on a signal having one or more data packets (e.g., data from two elements interacting with another element in a local system, distributed system, and/or across a network, such as the internet with other systems by way of the signal).
First, some terms in the present application are explained so as to be easily understood by those skilled in the art.
(1) Cloud computing: namely Cloud Computing, refers to a new Computing paradigm that has the advantages of integration, connectivity in a network environment, and the ability to provide Computing, storage, and even software to users in a service fashion. The difference between the new computing paradigm and the old computing paradigm is that it is perceptive and useful in that it does not have a visible fixed form for users, or even a state of no resources available, and is called cloud computing.
(2) Artificial intelligence: the artifiacial Intelligence, AI for short, is a general term for a method, technology, software, hardware and system for simulating human Intelligence through a computing system.
(3) Machine learning: machine learning is an important branching technique in the field of AI. Machine learning extracts data patterns from the sample data in order to make the best possible predictions of the application data. The implementation process of machine learning directly corresponds to model training.
(4) Model training: the method is a generation process of an application data pattern extraction and application data processing process taking machine learning or/and deep learning as a core. The application data refers to real data generated in an actual production and service environment and receives a processing process of a processing procedure generated by model training. With respect to such application data, the data processed by model training is sample data. In view of the typical data processing process that conventional computers have, model training involves two goals: a processing mode is found for the application data and the processing is made available on some existing equipment.
(5) Model release: the model is trained to obtain a software package capable of processing application data to be processed, the software package needs SDK and is issued to a specific software and hardware environment in a specific API mode so as to facilitate the use of service application. The SDK packing and issuing, API issuing, and other processes are the normal AI model issuing, referred to as model issuing for short.
(6) Pre-deployment state: the AI model resulting from the training process is published and installed based on application goals into specific hardware and software environments, either in the cloud, on the edge, or directly on the local side of the business application. The result of this release and installation is static in nature, called the pre-deployment state.
(7) A deployment state: based on the results of the publishing and installation of the AI model, when the AI model is executed, the model executors responding to the AI interface calls are either local or non-local to the model user, e.g., the business application. The corresponding dynamic characteristic during the execution is called a deployment state.
Next, a technical analysis and a method overview of the problems solved by the present application are presented. It is now well recognized that AI is a future technology that can solve not only previously solved problems, but also previously unsolved problems. On one hand, people collect data in many aspects on a large scale; on the other hand, people customize different underlying hardware for the problem area. Therefore, the underlying hardware of AI applications, which is common in the industry, is customized, which makes business applications need to perform in a specific manner when calling the AI API. This customization feature greatly limits the flexibility of the AI in application. The method and the corresponding embodiment provided by the invention are used for improving the flexibility of AI application.
The invention will be further explained with reference to the drawings. Wherein:
fig. 1 is a block diagram 1100 illustrating an embodiment of the present invention. The aim of the invention is to improve the flexibility of AI application, wherein 1100 is shown as a service application terminal where an AI model is issued, 1101 is shown as a service application unit, 1102 is shown as an AI-SDK unit, 1103 is shown as a model management module, 1104 is shown as a local model module, 1105 is shown as a remote model agent module, 1106 is shown as an optimization feedback module, and 1107 is shown as a model policy module. Where 1101 is a service application unit, which is carried by one or a set of software codes implementing the service application logic, in the subject matter of the present invention 1101 issues a call request to the AI model, which is implemented through the AI API exposed by the AI-SDK unit. 1102 therein is an AI-SDK unit exposed by the service application environment as the AI model issues an AI API to the service application environment, which further comprises the modules described below. 1103 therein is a model management module, which is used to manage and maintain the AI models and remote agents published locally, and to record the pre-deployment state information of the model publication results. Wherein 1104 is a local model module, which is an executive corresponding to a functional model group published to a business application local AI model and containing AI models with the same capabilities under the gist of the present invention. 1104 is an agent module of the remote model issued in the cloud, and the agent converts the call request for the AI model into a remote request, such as an HTTP API or a SOAP API. 1106 is an optimization feedback module, which is used to collect optimization data for calling the AI model, such as data for evaluating the effect of the model, or sample data for retraining the AI model. 1107 of the modules is a model policy module, which is used for decision-making of model deployment state, selection of functional model interface, and the like.
Fig. 2 is one of the component diagrams 1200 of the present invention, which illustrates the implementation and provision of AI capability, i.e. the process of AI training from sample to model, wherein: 1201 is a parameter setting unit, which is used to manage and set the parameters required by the model training unit 1210 during the model training; 1211 is a data obtaining module, configured to obtain a training sample required by training an AI model; 1212 is a data mixing module, which is used for mixing the dynamic sample and the static sample, and performing operations such as weight adjustment, positive and negative sample sampling, etc.; 1213 is a data cleaning module for cleaning the training samples, such as filtering invalid samples; 1214 is a data evaluation module for evaluating the quality of the training sample and judging whether the training sample is suitable for training; 1215 is a model training module which performs model training according to an AI algorithm using training samples; 1216 is a corpus management module, which is used to perform unified management on the original corpus data for generating training samples. Where 12X0 illustrates an unspecified number (one or more) of model training units as shown at 1210.
Fig. 3 is a diagram 1300 illustrating the implementation components required by the AI model for the post-training publishing function according to the present invention. Which comprises the following steps: 1301 is a model management module, which is used for managing and maintaining the generated AI model, including a cache to be issued to a destination model; 1302 is an equipment management module, configured to manage and maintain a mapping relationship between an AI model and equipment that can issue deployable AI models; 1303 is a model publishing module, configured to be responsible for publishing the trained AI model to a terminal or a cloud to be deployed, including pre-deployment state information of the organization and publishing model; 1304 is an interface management module for managing the external interface of the released model.
Fig. 4 is one of the composition diagrams 1400 of the present invention, which illustrates an implementation composition for tuning an AI model based on repetitive training. 1401 is a model management module, which is used to manage and maintain the effective AI model; 1402 is a sample management module, which is used for managing and maintaining samples used for training and generating the AI model; 1403, the feedback collection module is configured to collect dynamic information provided by the dynamic feedback function, where the dynamic information includes an effect evaluation obtained by deploying an AI model and a dynamic sample generated by converting application data obtained in an actual production and service process; reference numeral 1404 denotes an optimization decision module, which is configured to decide an impact factor of the dynamic sample on the AI model by evaluating a weight of the dynamic sample, thereby deciding whether to retrain the sample to generate the model, and further deciding whether to update the sample data.
Fig. 5 is a diagram 1500 illustrating an implementation of a dynamic information collection function based on decision making and selection when invoking an AI model interface for a business application. Wherein 1510 is a dynamic cell, further comprising: 1511 is an environment dynamic information collection module (for example, a temperature sensor collects temperature), 1512 is an computational power collection module for collecting the overall computational power and idle computational power; 1513, shown is a memory collection module, configured to collect the entire memory capacity and the amount of idle memory; shown 1514 is a communications collection module for collecting the overall capabilities and idle capabilities of the network communications. The acquisition results are collected by the environment dynamic collection module in a unified way, and then the real-time or quasi-real-time information acquired by the acquisition module is improved to a requester based on a specific dynamic information request.
Fig. 6 is one of the composition diagrams 1600 of the present invention, which illustrates a connection and networking overall diagram of related modules/units for implementing the functions of the present invention. 1610 and 1620 refer to terminals where service applications are located, and they refer to operation environments of the service applications, including but not limited to the same or different service application units, AI-SDK units required for the service units, dynamic acquisition modules or units, and the like, where the AI-SDK also includes local AI deployments; 1630, which is a deployment server of the AI model, it includes one or more model deployment units (such as model deployment unit-01 and model deployment unit-02 shown in the figure) for providing remote deployment of the AI model call by the service application, and the number of these units is only for illustration and not for limitation; 1640 are AI model training servers that contain one or more model training elements (e.g., model training element-01 and model training element-02 as shown) that are used to generate AI models from samples, the number of which is shown by way of illustration and not limitation; 1650 is an AI model publishing server, which includes one or more model publishing units (such as model publishing unit-01 and model publishing unit-02 shown in the figure), where the publishing unit is configured to push and publish the trained AI model to the deployment environment, and includes a cache of models to be published, and the number of the publishing units is only shown in the figure and is not limited; 1660 is a model tuning server, which comprises one or more model tuning units (such as model tuning unit-01 and model tuning unit-02 shown in the figure) for performing continuous tuning of the AI model, and the number of the model tuning units is only shown in the figure and is not limited thereto.
Further, the flow of the functions of the present invention is described below with reference to the functional composition diagram. Wherein:
FIG. 7 is a block diagram of the overall process 1700 for model training, release, deployment decision-making, functional model selection and automatic tuning, etc. in accordance with the present invention. The general flow may be described separately in different sub-flows, including: training and publishing sub-processes, deploying and executing sub-processes, and feeding back and decision-making and optimizing the sub-processes.
The training and publishing sub-process comprises the following steps:
17A-receive or extract samples used for initial training;
17B-training to generate an AI model using the acquired samples;
17C-model servization processes the generated AI model to generate a specific Local API and a remote HTTP API;
17E-SDK and packaging the result of the model service processing;
17F-sending an update notification according to the device information. The foregoing links occur at the server environment side, and the following links occur at the model deployment environment, particularly at the terminal side where the business application is located;
17G-receive model update notification;
17H-download and install AI model
The deployment and execution flow comprises the following steps:
17I-service application initiates the calling of the corresponding interface of the AI model;
17J-responding to the calling of the AI interface, and acquiring related dynamic information from the dynamic acquisition unit;
17K-analyzing the relevant data and information, and making a decision on model deployment of the model;
17L-in a local deployment state, selecting a corresponding functional model group according to the dynamic information of the end side;
17M-deploy application native function model;
17N-service application receives the result of calling AI interface;
17O-in a remote deployment state, selecting a corresponding functional model group according to the dynamic information of the terminal side;
17P-deploying a remote function model of an application cloud;
the feedback and decision tuning sub-process comprises the following steps:
17R-after the model deployment execution is finished, collecting the deployment effect of the model;
17S, reporting the collected deployment effect;
17T-converting the deployment effect of the model into a dynamic sample;
17B-training the generated AI model, again using the static (or initial) samples, in conjunction with the dynamic samples;
FIG. 8 is a flow 1800 for making deployment decisions and function selections for an AI model after a business application calls an AI API. The detailed process comprises the following steps:
18A-service application initiates the calling of the corresponding interface of the AI model;
18B-according to the pre-deployment state of the corresponding AI model, optionally, collecting dynamic information locally from the business application end;
18C-optionally, collecting dynamic information from the cloud according to the pre-deployment state of the corresponding AI model;
18D, deciding the deployment state of the AI model according to the dynamic information of each end side;
18E-optionally, downloading the AI model to the local side of the application service in case the deployment state of the model is local to the service application;
18F, selecting the function model according to the dynamic information under the condition that the model deployment state is local to the service application;
18G-when the model deployment state is local to the service application, deploying and executing an AI function model according to the selection condition;
18C-in the case that the model deployment state is the remote cloud end side, optionally, acquiring the dynamic information of the cloud end side again;
18H, in the case that the model deployment state is the remote cloud side, selecting the functional model according to the dynamic information;
18I-in the case that the model deployment state is at the remote cloud end side, deploying and executing an AI function model according to the selection condition;
18J-feedback results of model deployment execution to Business applications
18K-feeding the optimized dynamic sample back to the tuning unit;
fig. 9 is a flow 1900 for automatically and continuously tuning an AI model after model deployment is performed. The detailed process comprises the following steps:
19A-receiving effect evaluation data after the AI model deployment is executed;
19B-generating dynamic samples;
19H-extracting (static) samples used in the original training generation process of the AI model;
19C-evaluating whether the decision needs to be retrained to generate a new model, optionally, evaluating the dynamic sample weight obtained by the decision using effect evaluation, and comparing the relation between the dynamic sample weight and the static sample weight;
19D-retraining the generative model;
19E-publishing the regenerated model;
19F-optionally, adjusting the weights of the static samples;
19G-optionally, generating a new static sample;
in this application, the units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, and may be located in a single network node, or may be distributed on multiple network nodes. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment of the present invention.
In addition, according to specific constraints and implementation requirements, functional components in the embodiments of the present application may be integrated into one component, or each component may exist alone physically, or two or more components may be integrated into one component. The integrated components can be realized in a form of hardware or a form of software functional units.
The integrated components, if implemented in the form of software functional units and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present invention essentially contributes to the prior art, or all or part of the technical solution can be embodied in the form of a software product stored in a storage medium and including instructions for causing one or more computer devices (which may be personal computers, servers, or network devices) to execute all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.
While the invention has been described with reference to specific embodiments, the scope of the invention is not limited thereto, and those skilled in the art can easily conceive various equivalent modifications or substitutions within the technical scope of the invention. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.
It should be understood that, in the various embodiments of the present application, the serial numbers of the above-mentioned processes do not mean a strict order of execution, and the execution order of the processes should be determined by their functions and inherent logic, and should not constitute any limitation on the implementation process of the embodiments of the present invention. While the present application has been described herein in conjunction with various embodiments, other variations to the disclosed embodiments may be understood and effected by those skilled in the art in practicing the present application as claimed herein.

Claims (9)

1. A method for adaptive deployment of AI models, comprising:
receiving call of a business application to an AI (interface), wherein the call to the AI comprises data assignment of parameters serving as the AI, and the AI called by the business application corresponds to a specific AI model;
deciding a deployment state of the AI interface calling corresponding model, wherein the deployment state is the deployment state of the AI model corresponding to the called AI interface, and the decision on the deployment state of the AI model comprises the use of a pre-deployment state of the AI model;
selecting a functional model corresponding to the AI model, the functional model being a plurality of differential implementations corresponding to the same AI model, and the choose selection for the AI functional model being a selection from the plurality of differential implementations corresponding to the AI model.
2. The method of claim 1, wherein the AI invokes a decision of the deployment state of the model, further comprising:
and acquiring the pre-deployment state of the candidate function model of the AI model corresponding to the called AI interface.
3. The method of claim 2, wherein the AI invokes a decision of the model deployment state, further comprising:
obtaining dynamic information of an end-side determined by a pre-deployment state of a candidate functional model of the AI model, the dynamic information including availability of the determined end-side.
4. The method of claim 3, wherein the selection of the deployment state for a functional model further comprises:
and acquiring dynamic information of the end side where the functional model is located, wherein the dynamic information is determined by the deployment state of the AI model.
5. The method of claim 4, wherein the functional model is selected based on the dynamic information, further comprising:
and determining the selection result of the functional model according to the matching result between the resource demand of the functional model and the end-side resource of the dynamic information reaction.
6. The method of claim 5, wherein the adaptive deployment of the AI model further comprises:
and executing the determined function model called according to the AI interface, and outputting an optimization sample of the function model.
7. The method of claim 6, wherein the AI model deploys the post-execution optimization exemplars, further comprising:
and generating a used static sample by taking the sample as a dynamic sample and combining with the training of the AI model, determining and re-training the model.
8. A computer-readable storage medium, characterized in that the computer storage medium stores program instructions that, when executed by a processor, cause the processor to perform the method of any of claims 1-7.
9. The device for executing the computer program is characterized by comprising a processing component, a storage component and a communication component, wherein the processing component, the storage component and the communication component are connected with each other, the storage component is used for storing data processing codes, and the communication component is used for carrying out information interaction with an external device; the processing component is configured to invoke program code to perform the method of any of claims 1 to 7.
CN201911363266.1A 2019-12-26 2019-12-26 Self-adaptive AI model deployment method Pending CN113050955A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201911363266.1A CN113050955A (en) 2019-12-26 2019-12-26 Self-adaptive AI model deployment method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911363266.1A CN113050955A (en) 2019-12-26 2019-12-26 Self-adaptive AI model deployment method

Publications (1)

Publication Number Publication Date
CN113050955A true CN113050955A (en) 2021-06-29

Family

ID=76505962

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911363266.1A Pending CN113050955A (en) 2019-12-26 2019-12-26 Self-adaptive AI model deployment method

Country Status (1)

Country Link
CN (1) CN113050955A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114707646A (en) * 2022-01-26 2022-07-05 电子科技大学 Distributed artificial intelligence practice platform based on remote reasoning
WO2023044631A1 (en) * 2021-09-22 2023-03-30 Siemens Aktiengesellschaft A device, system, method and storage medium for ai application deployment

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101946258A (en) * 2007-12-20 2011-01-12 惠普开发有限公司 Model based deployment of computer based business process on dedicated hardware
CN104335170A (en) * 2012-06-08 2015-02-04 惠普发展公司,有限责任合伙企业 Cloud application deployment
CN107025509A (en) * 2016-02-01 2017-08-08 腾讯科技(深圳)有限公司 Decision system and method based on business model
US20180115468A1 (en) * 2016-10-25 2018-04-26 International Business Machines Corporation Hybrid cloud broker with static and dynamic capability matching
CN108574702A (en) * 2017-03-08 2018-09-25 中兴通讯股份有限公司 A kind of cloud application dispositions method and system
US20190082004A1 (en) * 2017-09-14 2019-03-14 Cisco Technology, Inc. Systems and methods for instantiating services on top of services
WO2019113122A1 (en) * 2017-12-04 2019-06-13 Conversica, Inc. Systems and methods for improved machine learning for conversations
US20190340524A1 (en) * 2018-05-07 2019-11-07 XNOR.ai, Inc. Model selection interface

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101946258A (en) * 2007-12-20 2011-01-12 惠普开发有限公司 Model based deployment of computer based business process on dedicated hardware
CN104335170A (en) * 2012-06-08 2015-02-04 惠普发展公司,有限责任合伙企业 Cloud application deployment
CN107025509A (en) * 2016-02-01 2017-08-08 腾讯科技(深圳)有限公司 Decision system and method based on business model
US20180115468A1 (en) * 2016-10-25 2018-04-26 International Business Machines Corporation Hybrid cloud broker with static and dynamic capability matching
CN108574702A (en) * 2017-03-08 2018-09-25 中兴通讯股份有限公司 A kind of cloud application dispositions method and system
US20190082004A1 (en) * 2017-09-14 2019-03-14 Cisco Technology, Inc. Systems and methods for instantiating services on top of services
WO2019113122A1 (en) * 2017-12-04 2019-06-13 Conversica, Inc. Systems and methods for improved machine learning for conversations
US20190340524A1 (en) * 2018-05-07 2019-11-07 XNOR.ai, Inc. Model selection interface

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
韩锐生;徐开勇;赵彬;: "P2DR模型中策略部署模型的研究与设计", 计算机工程, no. 20, pages 186 - 189 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2023044631A1 (en) * 2021-09-22 2023-03-30 Siemens Aktiengesellschaft A device, system, method and storage medium for ai application deployment
CN114707646A (en) * 2022-01-26 2022-07-05 电子科技大学 Distributed artificial intelligence practice platform based on remote reasoning

Similar Documents

Publication Publication Date Title
Bhattacharjee et al. Barista: Efficient and scalable serverless serving system for deep learning prediction services
Kapsalis et al. A cooperative fog approach for effective workload balancing
CN111371603B (en) Service instance deployment method and device applied to edge computing
CN110869909B (en) System and method for applying machine learning algorithms to calculate health scores for workload scheduling
CN108156236A (en) Service request processing method, device, computer equipment and storage medium
Pascual et al. Self-adaptation of mobile systems driven by the common variability language
Nguyen et al. Monad: Self-adaptive micro-service infrastructure for heterogeneous scientific workflows
Caporuscio et al. Reinforcement learning techniques for decentralized self-adaptive service assembly
CN113050955A (en) Self-adaptive AI model deployment method
CN115297008B (en) Collaborative training method, device, terminal and storage medium based on intelligent computing network
CN114816753A (en) Data cluster computing node scaling method, device, equipment and medium
CN114706675A (en) Task deployment method and device based on cloud edge cooperative system
Schlegel et al. Towards autonomous mobile agents with emergent migration behaviour
Bensalem et al. Scaling Serverless Functions in Edge Networks: A Reinforcement Learning Approach
Ricardo et al. Developing machine learning and deep learning models for host overload detection in cloud data center
WO2018035101A1 (en) Spending allocation in multi-channel digital marketing
Jayasinghe et al. An analysis of throughput and latency behaviours under microservice decomposition
Calzarossa et al. Evaluation of cloud autoscaling strategies under different incoming workload patterns
CN112783641A (en) Service interface flow control method and device
Laroui et al. Scalable and cost efficient resource allocation algorithms using deep reinforcement learning
CN116367190A (en) Digital twin function virtualization method for 6G mobile network
Nikbazm et al. KSN: Modeling and simulation of knowledge using machine learning in NFV/SDN-based networks
CN115904708A (en) AI platform dynamic weighting scheduling method, device and storage medium
CN114936089A (en) Resource scheduling method, system, device and storage medium
Gonzalo et al. CLARA: A novel clustering-based resource-allocation mechanism for exploiting low-availability complementarities of voluntarily contributed nodes

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
WD01 Invention patent application deemed withdrawn after publication

Application publication date: 20210629

WD01 Invention patent application deemed withdrawn after publication