WO2023098460A1 - Procédé et appareil de mise à jour de modèle et dispositif associé - Google Patents

Procédé et appareil de mise à jour de modèle et dispositif associé Download PDF

Info

Publication number
WO2023098460A1
WO2023098460A1 PCT/CN2022/131668 CN2022131668W WO2023098460A1 WO 2023098460 A1 WO2023098460 A1 WO 2023098460A1 CN 2022131668 W CN2022131668 W CN 2022131668W WO 2023098460 A1 WO2023098460 A1 WO 2023098460A1
Authority
WO
WIPO (PCT)
Prior art keywords
model
updated
trained
training
sample set
Prior art date
Application number
PCT/CN2022/131668
Other languages
English (en)
Chinese (zh)
Inventor
吕超群
刘凌辉
杨锦
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Publication of WO2023098460A1 publication Critical patent/WO2023098460A1/fr

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F8/00Arrangements for software engineering
    • G06F8/60Software deployment
    • G06F8/65Updates
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N5/00Computing arrangements using knowledge-based models
    • G06N5/04Inference or reasoning models

Definitions

  • the present application relates to the field of artificial intelligence, in particular to a model updating method, device and related equipment.
  • AI is a theory, method, technology and application system that uses digital computers or machines controlled by digital computers to simulate, extend and expand human intelligence, perceive the environment, acquire knowledge and use knowledge to obtain the best results.
  • artificial intelligence is the branch of computer science that attempts to understand the nature of intelligence and produce a new class of intelligent machines that respond in ways similar to human intelligence.
  • Artificial intelligence is to study the design principles and implementation methods of various intelligent machines, so that the machines have the functions of perception, reasoning and decision-making.
  • Research in the field of artificial intelligence includes robotics, natural language processing, computer vision, decision-making and reasoning, human-computer interaction, recommendation and search, and basic AI theory.
  • classification models are used to classify data, realize data classification automation, and improve classification efficiency.
  • Image recognition models are used to identify image content and realize image content automatic recognition and improve recognition efficiency.
  • the existing methods for updating the model based on incremental learning usually use offline learning or online learning to update the model.
  • offline learning it is necessary to manually track the performance of the model, continuously train the model repeatedly and Manual deployment and online deployment will inevitably consume more human resources and time, and the model update efficiency is relatively low;
  • online learning methods new models will be continuously trained, continuously verified, and continuously used.
  • the new model replaces the old model, which is bound to consume a lot of computing resources.
  • This application provides a model update method, device and related equipment, which can solve the problem of low model update efficiency in the method of updating the model in the offline learning mode, and at the same time solve the problem that the method of updating the model in the online learning mode consumes a large amount of calculations resource problem.
  • a method for updating a model includes: firstly, obtaining a training sample set, and then, when the first trigger mechanism is used to determine that the first model needs to be updated and trained, the first model is updated through the training sample set.
  • the training is updated to obtain the trained model, and finally, when the second trigger mechanism is used to determine that the first model needs to be replaced with the updated trained model, the first model is replaced with the updated trained model.
  • the present application determines whether the first model needs to be updated and trained through the first trigger mechanism, and determines whether the first model needs to be replaced with the updated and trained model through the second trigger mechanism, which can trigger the model on demand for automatic training.
  • Update training and automatic update deployment can reduce the consumption of computing resources while improving the efficiency of model update.
  • the first triggering mechanism includes: if the number of difficult samples in the training sample set reaches the first threshold, then determine that the first model needs to be updated and trained; or, if the current time reaches the model update time , it is determined that the first model needs to be updated and trained; or, if the number of samples in the training sample set reaches the second threshold, it is determined that the first model needs to be updated and trained; or, if the online duration of the first model reaches the preset duration , it is determined that the first model needs to be updated and trained.
  • the second threshold is a natural number greater than 1.
  • the present application provides multiple mechanisms for determining whether to trigger the model to perform update training, and the user can choose any trigger mechanism, which has strong flexibility.
  • the second trigger mechanism includes: if the prediction performance of the first model is lower than the prediction performance of the updated model, then determining that the first model needs to be replaced with the updated model; or, if If the prediction performance of the updated and trained model is within the expected prediction performance range, it is determined that the first model needs to be replaced with the updated and trained model.
  • the present application provides multiple mechanisms for determining whether to trigger the replacement of the old model with the new model, and the user can choose any trigger mechanism, which has strong flexibility.
  • the update training of the first model through the training sample set can be implemented specifically as follows: first, the training sample set is screened to determine the difficult samples in the training sample set, and then, using the training sample set The concentrated hard examples are used to update and train the first model.
  • this application uses the difficult samples in the training sample set to update and train the model, instead of using all the samples in the training sample set to update and train the model as in the prior art. In this way, the consumption of computing resources can be further reduced. Improve model update efficiency.
  • the samples in the training sample set can be screened in the following way to determine the difficult samples in the training sample set: first, each sample in the training sample set is input into the first model pair for inference, The attributes of the inference results corresponding to each sample are obtained, and the attributes include any of the following: confidence and cross entropy, and then, according to the attributes of the inference results of each sample, determine whether each sample is a difficult sample.
  • a device for updating a model includes: an acquisition unit, configured to acquire a training sample set;
  • a model training unit configured to perform update training on the first model through the training sample set to obtain an updated trained model when the first trigger mechanism is used to determine that the first model needs to be updated and trained;
  • a model deploying unit configured to replace the first model with the updated trained model when it is determined that the first model needs to be replaced with the updated trained model by using the second trigger mechanism.
  • the first trigger mechanism includes: the first trigger mechanism includes: if the number of difficult samples in the training sample set reaches a first threshold, it is determined that the first model needs to be updated and trained; or , if the current time reaches the model update time, it is determined that the first model needs to be updated and trained; or, if the number of samples in the training sample set reaches the second threshold, it is determined that the first model needs to be updated and trained; or, if the first When the online duration of the model reaches the preset duration, it is determined that the first model needs to be updated and trained.
  • the second threshold is a natural number greater than 1.
  • the second trigger mechanism includes: if the prediction performance of the first model is lower than the prediction performance of the updated model, then determining that the first model needs to be replaced with the updated model; or, if If the prediction performance of the updated and trained model is within the expected prediction performance range, it is determined that the first model needs to be replaced with the updated and trained model.
  • the above-mentioned model training unit is specifically used to: firstly, filter the training sample set to determine the difficult examples in the training sample set, and then use the difficult examples in the training sample set to perform the first
  • the model is updated for training.
  • the above model training unit is specifically used to: firstly, input each sample in the training sample set into the first model pair for inference, and obtain the attributes of the inference results corresponding to each sample, and the attributes include the following Either: confidence, cross-entropy, and then, based on the properties of each sample's inference results, determine whether each sample is a hard sample.
  • a computer-readable storage medium stores instructions, and the instructions are used to implement the method provided in the above-mentioned first aspect or any possible implementation manner of the first aspect.
  • a computing device in a fourth aspect, includes a processor and a memory; the processor is configured to execute instructions stored in the memory, so that the computing device realizes any possibility of the above first aspect or the first aspect The method provided by the implementation of .
  • a computer program product including a computer program.
  • the computer program When the computer program is read and executed by a computing device, the computing device executes the above-mentioned first aspect or any possible implementation of the first aspect. provided method.
  • Fig. 1 is a schematic diagram of an artificial intelligence subject framework provided by the present application
  • Fig. 2 is a schematic structural diagram of a model updating method provided by the present application.
  • FIG. 3 is a schematic flow diagram of determining difficult samples from the training sample set provided by the present application.
  • Fig. 4 is a schematic diagram of deployment of a model updating device provided by the present application.
  • FIG. 5A is a schematic structural diagram of a model updating device provided by the present application.
  • Fig. 5B is a schematic structural diagram of another model updating device provided by the present application.
  • Fig. 6 is a schematic flow chart of another model updating method provided by the present application.
  • FIG. 7 is a schematic structural diagram of a computing device provided by the present application.
  • Offline learning It can also be called batch learning or offline training. It is a batch (referring to a batch of data) that updates the model weights after training. In this case, all training data must be available during model training. Moreover, only after the model training is completed, the model can be deployed online to predict the online data. Offline learning has the disadvantages of low model training efficiency, difficulty in expanding the training process to large data scenarios, and inability of the model to adapt to dynamically changing environments.
  • Online learning It can also be called adaptive learning or online training. It refers to receiving data in a certain order. Every time a data is received, the model will predict the data and train the current model, and then process the next data. . Online learning is to update the weights directly after a data training is completed, rather than updating the weights after a batch is trained. That is to say, online learning does not need to provide a complete training data set at the beginning. As more real-time online data arrives, the model will be continuously updated during operation.
  • Incremental learning refers to a model that can continuously learn new knowledge from new samples and preserve most of the previously learned knowledge. Incremental learning is very similar to the human learning model itself. Because people learn and accept new things every day in the process of growing up, learning is carried out gradually, and human beings generally will not forget the knowledge they have learned. The idea of incremental learning can be described as: whenever new data is added, the model does not need to rebuild all the knowledge bases, but only updates the changes caused by the new data on the basis of the original knowledge base. We found that the incremental learning method is more in line with human thinking principles. Online learning must be incremental, because online learning is implemented by streaming in data one by one to update the model. Incremental learning is not necessarily online, because given a model and a batch of offline data, incremental learning can use this batch of offline data to update the previously trained model without training a model from scratch.
  • Difficult samples It can be referred to as difficult examples for short, which refers to difficult samples in which the inference results of the model do not meet expectations during the inference process.
  • model updating is a long-term process, such as updating and training the model on a weekly or monthly basis, or updating and training the model when the accumulated data reaches a certain amount. If the full amount of data is used for model update training, it will take a lot of labeling manpower and training time.
  • difficult samples are screened from the full amount of data, and only difficult samples are used for model update training, which can save labeling manpower and training time, and improve model update efficiency.
  • Figure 1 shows a schematic diagram of an artificial intelligence main framework, which describes the overall workflow of an artificial intelligence system and is applicable to general artificial intelligence field requirements.
  • Intelligent information chain reflects a series of processes from data acquisition to processing. For example, it can be the general process of intelligent information perception, intelligent information representation and formation, intelligent reasoning, intelligent decision-making, intelligent execution and output. In this process, the data has undergone a condensed process of "data-information-knowledge-wisdom".
  • IT value chain reflects the value brought by artificial intelligence to the information technology industry from the underlying infrastructure of artificial intelligence, information (provided and processed by technology) to the systematic industrial ecological process.
  • the infrastructure provides computing power support for the artificial intelligence system, realizes communication with the outside world, and realizes support through the basic platform.
  • computing power is provided by smart chips, such as central processing unit (central processing unit, CPU), neural network processor (neural-network processing unit, NPU), image processing unit (graphics processing unit, GPU), Application specific integrated circuit (ASIC), programmable logic gate array (field programmable gate array, FPGA) and other hardware acceleration chips;
  • the basic platform includes distributed computing framework and network and other related platform guarantees and supports, which may include cloud Storage and computing, interconnection network, etc.
  • sensors communicate with the outside to obtain data, and these data are provided to the smart chips in the distributed computing system provided by the basic platform for calculation.
  • Data from the upper layer of the infrastructure is used to represent data sources in the field of artificial intelligence.
  • the data involves graphics, images, voice, text, and IoT data of traditional equipment, including business data of existing systems and sensory data such as force, displacement, liquid level, temperature, and humidity.
  • Data processing usually includes data training, machine learning, deep learning, search, reasoning, decision-making, etc.
  • machine learning and deep learning can symbolize and formalize intelligent information modeling, extraction, preprocessing, training, etc. of data.
  • Reasoning refers to the process of simulating human intelligent reasoning in a computer or intelligent system, and using formalized information to carry out machine thinking and solve problems according to reasoning control strategies.
  • the typical functions are search and matching.
  • machine learning models also called machine learning algorithms, hereinafter referred to as models
  • models are common means of use.
  • object detection models in unmanned inspection scenes where items are placed can detect the category and position, realize automatic detection, and improve detection efficiency.
  • the prediction accuracy of the model will continue to decrease.
  • the model needs to be updated. For example, the colors of the images captured by the camera in winter are more monotonous than the images captured in spring. If the model is trained using images with brighter colors captured in spring, the recognition accuracy of the model for images captured in spring is higher. However, the recognition accuracy of images captured in winter is relatively low.
  • the commonly used model update method is the model update method based on incremental learning.
  • the existing incremental learning-based model update methods usually use offline learning or online learning to update the model.
  • offline learning it is necessary to manually track the performance of the model and continuously train the model repeatedly. After the update training is completed, it will be manually deployed online, which will inevitably consume more human resources and time, and the update efficiency of the model will be relatively low; when the online learning method is adopted, new models will be continuously trained and updated continuously. For verification, the new model is constantly used to replace the old model. Although the update efficiency of the model can be improved, it will consume a lot of computing resources.
  • this application provides a model update method, device and related equipment.
  • a mechanism for automatically triggering model updates and training under certain conditions and a mechanism for automatically triggering model updates and deployments under certain conditions, it is possible to trigger models for automatic update training and automatic update deployment on demand, which can improve model updates. While improving efficiency, reduce the consumption of computing resources.
  • Fig. 2 is a schematic flow chart of a model updating method provided by the present application. As shown in Fig. 2, the method includes:
  • the training sample set includes multiple samples, and the multiple samples may all be newly generated samples online, or all may be samples obtained offline, or some may be newly generated samples online, and some may be obtained offline. samples, which are not specifically limited here.
  • the training sample set includes samples obtained offline, some or all of the samples obtained offline may be old samples used for training the model before the model is deployed online, or new samples obtained offline , can also be old samples generated by using generative adversarial networks (GAN), and this application does not limit the source of samples in the training sample set.
  • GAN generative adversarial networks
  • the samples included in the training sample set may be various types of data such as images, videos, audios, texts, etc., which are not specifically limited here.
  • S202 Use the first trigger mechanism to determine whether the first model needs to be updated and trained. When it is determined that the first model needs to be updated and trained, execute S203 and S205. When it is determined that the first model does not need to be updated and trained, execute S204. .
  • the first model can be a model of various purposes such as an image classification model, an object detection model, a sound classification model or a text classification model
  • the neural network for realizing the first model can be a random forest (random forest), a support vector machine (support vector machine, SVM), graph neural networks (graph neural networks, GNN), convolutional neural networks (convolutional neural networks, CNN), etc., are not specifically limited here.
  • the first trigger mechanism can be any of the following forms:
  • Form 1 Obtain the number of difficult samples in the training sample set in real time or periodically, and when it is detected that the number of difficult samples reaches the first threshold, it is determined that the first model needs to be updated and trained, and then the execution of step S203 is automatically triggered .
  • the first threshold is a natural number greater than 0, and the size of the first threshold can be set according to actual conditions.
  • the first threshold can be 300, 500, etc., which is not specifically limited here.
  • the determination process of difficult samples in the training sample set please refer to FIG. 3 and related descriptions.
  • Form 2 Obtain the number of samples in the training sample set in real time or periodically. When it is detected that the number of samples reaches the second threshold, it is determined that the first model needs to be updated and trained, and then the execution of step S203 is automatically triggered.
  • the second threshold is a natural number greater than 1, and the size of the second threshold can be set according to actual conditions.
  • the second threshold can be 500, 1000, etc., which is not specifically limited here.
  • Form 3 Monitor whether the current time reaches the preset model update time. When it is detected that the current time reaches the preset model update time, it is determined that the first model needs to be updated and trained, and then the execution of step S203 is automatically triggered.
  • the preset model update time can be set according to the actual situation, for example, set to 2:00 am every day, or set to 00:00 on the 15th of every month, which is not specifically limited here.
  • Form 4 Monitor whether the online duration of the first model reaches the preset duration. When it is detected that the online duration reaches the preset duration, it is determined that the first model needs to be updated and trained, and then the execution of step S203 is automatically triggered.
  • the preset duration can be set according to the actual situation, for example, set to 500 hours or 1000 hours, etc., which is not specifically limited here.
  • the difficult samples in the training sample set can be determined through the steps shown in Figure 3:
  • the attributes include confidence, cross entropy (cross entropy) and so on.
  • a sample corresponds to one or more inference results
  • the first confidence threshold is less than, it is determined that the sample is a difficult sample, otherwise, it is not a difficult sample.
  • the mean value of the confidence levels corresponding to multiple inference results can be calculated. If the mean value of the confidence level is less than the second confidence level threshold, it is determined that the sample is a difficult sample; otherwise, it is not a difficult sample.
  • S203 Perform update training on the first model by using the training sample set to obtain an updated and trained model.
  • the implementation form of the first triggering mechanism is the above-mentioned form 1
  • all the samples in the training sample set can be directly used to update the first model to obtain an updated model.
  • the first model can be updated and trained by using the difficult samples in the training sample set to obtain an updated and trained model.
  • the implementation form of the first trigger mechanism is the above-mentioned form 3 or form 4
  • all the samples in the training sample set can be directly used to update the first model, or the difficult examples in the training sample set can be used to update the first model Train to get the updated trained model.
  • the former is a preferred way to update and train the first model.
  • the first model performs update training, which is not specifically limited here.
  • S205 Use the second trigger mechanism to determine whether the first model needs to be replaced by the updated trained model. When it is determined that the first model needs to be replaced, perform S206. When it is determined that the first model does not need to be replaced, perform S207.
  • the second trigger mechanism can be any of the following forms:
  • Form 1' Evaluate the prediction performance of the updated model and the first model (such as prediction accuracy, recall rate, etc.) The trained model is updated to replace the first model, and then the execution of step S206 is automatically triggered. For example, assuming that the predicted accuracy of the evaluated first model is 0.80 and the predicted accuracy of the updated trained model is 0.81, it is determined that the first model needs to be replaced with the updated trained model.
  • Form 2' only evaluate the predictive performance of the model after updating the training, if the predictive performance of the model after updating the training is within the range of expected predictive performance, then it is determined that the first model needs to be replaced with the model after training, and then the step S206 is automatically triggered implement.
  • the prediction performance is prediction accuracy
  • the expected prediction performance range is 0.80-0.90
  • the prediction accuracy of the updated and trained model is 0.85
  • the method for updating the prediction performance of the trained model and the evaluation method for the prediction performance of the first model may be a hold-out method, a cross validation method (cross validation), etc., which are not specifically limited here.
  • S201 after replacing the first model with the updated trained model, S201 will be executed again to obtain a new training sample set, and then S202 to S207 will be executed for a new round of model update.
  • the first trigger mechanism is used to determine whether the first model needs to be updated and trained
  • the second trigger mechanism is used to determine whether the updated model needs to be used to replace the first model.
  • the deployment of the model update device provided by the present application is flexible, and can be deployed in an edge environment, specifically an edge computing device in the edge environment or a software system running on one or more edge computing devices.
  • the edge environment refers to an edge computing device cluster built on the edge of the network geographically close to users to provide computing, storage, and communication resources.
  • the model update device can also be deployed in a cloud environment, which is an entity that uses basic resources to provide users with cloud services under the cloud computing model.
  • the cloud environment includes a cloud data center and a cloud service platform, and the cloud data center includes a large number of basic resources (including computing resources, storage resources and network resources) owned by the cloud service provider.
  • the model update device can be a server in the cloud data center, or a virtual machine created in the cloud data center, or a software system deployed on a server or a virtual machine in the cloud data center, and the software system can be distributed in a distributed manner. Deploy on multiple servers, or distributed on multiple virtual machines, or distributed on virtual machines and servers.
  • the model update device can also be partially deployed in the edge environment and partially deployed in the cloud environment, as shown in FIG. 4 .
  • each module inside the model update device can also be divided into multiple types, and each module can be a software module, or a hardware module, or partly a software module and partly a hardware module, which is not limited in this application.
  • each module can be a software module, or a hardware module, or partly a software module and partly a hardware module, which is not limited in this application.
  • FIG. 5A and the model updating device 500B shown in FIG. 5B there are two ways of dividing the model updating device exemplarily shown in this application.
  • the model update apparatus 500A shown in FIG. 5A includes: an acquisition unit 501 , a model training unit 502 and a model deployment unit 503 .
  • each module in the model updating device 500A can also be deployed on the same edge computing device, or on the same cloud data center, or on the same physical machine.
  • it can also be partially Deployed on the edge computing device, partly deployed on the cloud data center, for example, the acquisition unit 501 is deployed on the edge computing device, and the model training unit 502 and model deployment unit 503 are deployed on the cloud data center, which is not specifically limited in this application.
  • the obtaining unit 501 is configured to obtain a training sample set.
  • the model training unit 502 is configured to perform update training on the first model through the training sample set when using the first trigger mechanism to determine that the first model needs to be updated and trained, to obtain an updated and trained model;
  • the model deploying unit 503 is configured to replace the first model with the updated trained model when it is determined that the first model needs to be replaced with the updated trained model by using the second trigger mechanism.
  • the first triggering mechanism includes: if the number of difficult samples in the training sample set reaches the first threshold, then determine that the first model needs to be updated and trained; or, if the current time reaches the model update time , it is determined that the first model needs to be updated and trained; or, if the number of samples in the training sample set reaches a second threshold, it is determined that the first model needs to be updated and trained, wherein the second threshold is a natural number greater than 1; or, If the online duration of the first model reaches the preset duration, it is determined that the first model needs to be updated and trained.
  • the second trigger mechanism includes: if the prediction performance of the updated model is better than the prediction performance of the first model, it is determined that the updated model needs to be used to replace the first model; or, if If the prediction performance of the updated and trained model is within the expected prediction performance range, it is determined that the first model needs to be replaced with the updated and trained model.
  • the model training unit 502 can specifically update and train the first model through the training sample set in the following manner: first, filter the training sample set to determine the difficult samples in the training sample set, Then, the first model is updated and trained by using the difficult samples in the training sample set.
  • the model training unit 502 can specifically filter the samples in the training sample set in the following manner to determine the difficult samples in the training sample set: first, input each sample in the training sample set into the first The model performs inference to obtain the attributes of the inference results corresponding to each sample.
  • the attributes include any of the following: confidence, cross entropy, and then, according to the attributes of the inference results of each sample, determine whether each sample is a difficult sample .
  • the device 500B includes: a storage unit 510 , a management and control unit 520 , an inference unit 530 , a training unit 540 and an evaluation unit 550 .
  • each module in the model updating device 500B can also be deployed on the same edge computing device, or on the same cloud data center, or on the same physical machine.
  • it can also be partially Deployed on the edge computing device, partly deployed on the cloud data center, for example, the storage unit 510 and the reasoning unit 530 are deployed on the edge computing device, and the management and control unit 520, the training unit 540 and the evaluation unit 550 are deployed on the cloud data center, which is not specifically limited in this application.
  • the storage unit 510 is configured to store the training sample set and the first model acquired by the model updating apparatus 500B, and also store the evaluation sample set, as shown in FIG. 5B .
  • the training sample set is used to update and train the first model to obtain an updated and trained model
  • the evaluation sample set is used to evaluate the prediction performance of the obtained updated and trained model.
  • the storage unit 510 may also store a verification sample set, which is used to verify the performance of the updated trained model on the verification sample set before using the evaluation sample set to evaluate the prediction performance of the updated model after training, and at the same time, By adjusting the hyperparameters of the updated trained model, the updated trained model is in an optimal state.
  • the management and control unit 520 is used to control the entire model update process (including the model update training process and the model update deployment process). 540 whether to perform update training on the first model, whether the management and evaluation unit 550 evaluates the prediction performance of the updated model, and whether the management reasoning unit 530 uses the updated model to replace the first model.
  • the inference unit 530 the training unit 540 and the evaluation unit 550 are all in a non-running state.
  • the management and control unit 520 executes S601 to trigger the inference unit 530 to enter the running state.
  • the trigger model and data acquisition subunit 5301 executes S602 to acquire the first model and training sample set from the storage unit 510, and then the data screening subunit 5302 will train Each sample in the sample set is input into the first model for inference, and the attributes of the inference results corresponding to each sample are obtained.
  • the attributes include confidence, cross entropy, etc., and according to the attributes corresponding to the inference results corresponding to each sample, determine the Whether it is a hard sample, if it is determined that it is a hard sample, execute S603 to store the hard sample in the hard sample set of the storage unit 510 .
  • the data set management subunit 5201A in the management and control unit 520 can perform S604 to monitor the number of difficult samples in the difficult sample set in real time or periodically, and When it is detected that the number of difficult samples reaches the first threshold, it is determined that the first model needs to be updated and trained; otherwise, continue to monitor the number of difficult samples until it is detected that the number of difficult samples reaches the first threshold.
  • the first model is updated for training.
  • management and control unit 520 can monitor whether the current time reaches the model update time, and when it is detected that the current time reaches the model update time , it is determined that the first model needs to be updated and trained, otherwise, the current time is continuously monitored until the current time reaches the model update time, and it is determined that the first model needs to be updated and trained.
  • the first trigger 5202A in the management and control unit 520 executes S605 to trigger the training unit 540 to enter the running state, specifically, to trigger the model and data acquisition in the training unit 540
  • the subunit 5401 executes S606 to acquire the first model and the hard sample set from the storage unit 510, and then, the model training subunit 5402 uses the hard samples in the hard sample set to perform update training on the first model to obtain an updated trained model.
  • the management and control unit 520 can monitor the number of iterations that the training unit 540 uses the difficult samples in the difficult sample set to update and train the first model, and when the number of iterations reaches the maximum number of iterations , the first trigger 5202A in the management and control unit 520 notifies the training unit 540 that the training is over.
  • the management and control unit 520 can also monitor whether the current time reaches the preset training end time, and when the current time reaches the training end time, the first trigger 5202A in the management and control unit 520 notifies the training unit 540 that the training is over.
  • the management and control unit 520 can also monitor the duration of the training unit 540 updating the first model. When the training duration reaches the maximum training duration, the first trigger 5202A in the management and control unit 520 notifies the training unit 540 that the training is over.
  • the training unit 540 may execute S607 to send the first message to the management and control unit 520, notifying the management and control unit 520 that the first model update training is over, and execute S608 to store the updated and trained model in storage unit 510 .
  • the management and control unit 520 executes S609 to trigger the evaluation unit 550 to enter the running state, specifically, triggers the model and data acquisition subunit 5501 to execute S610 to acquire the evaluation sample set from the storage unit 510 , update the trained model and the first model, and then, the model evaluation subunit 5502 uses the evaluation sample set to evaluate the prediction performance of the updated model and the first model respectively, and finally, the evaluation unit 550 executes S611 to evaluate the obtained updated training
  • the final model and the predicted performance of the first model are uploaded to the management and control unit 520.
  • the management and control unit 520 After the management and control unit 520 receives the updated model after training and the prediction performance of the first model uploaded by the evaluation unit 550, the management and control unit 520 can determine whether the prediction performance of the updated and trained model is better than the prediction performance of the first model. When the prediction performance of the former is better than that of the latter, the management and control unit 520 executes S612 to control the reasoning unit 530 to update and deploy the model, wherein the specific process for the management and control unit 520 to control the reasoning unit 530 to update and deploy the model is as follows: the management and control unit 520 Obtain the updated and trained model from the storage unit 510, and then send the updated and trained model to the inference unit 530, so that the model deployment subunit 5303 in the inference unit 530 deploys the updated and trained model, that is, uses the updated and trained model Replace the first model that was previously deployed locally.
  • the management and control unit 520 may also send a model update deployment instruction to the reasoning unit 530, instructing the model deployment subunit 5303 in the reasoning unit 530 to obtain the updated model from the storage unit 510, and use the updated model to replace the previous model.
  • the management and control unit 520 may be deployed with a second trigger 5202B, which is used to determine whether to control the reasoning unit 530 to perform model update deployment according to the prediction performance of the updated trained model and the prediction performance of the first model.
  • the evaluation unit 550 enters the running state, which may only use the evaluation sample set to evaluate the prediction performance of the updated model, and does not evaluate the prediction performance of the first model, and then only evaluates The prediction performance of the updated trained model is uploaded to the management and control unit 520 .
  • the management and control unit 520 judges whether the predicted performance is within the expected predicted performance range, and controls the reasoning unit 530 to update the model if it is determined to be within the expected predicted performance range. deploy.
  • the second trigger 5202B deployed in the management and control unit 520 is used to judge whether to control the reasoning unit 530 to perform model update deployment according to the prediction performance of the updated and trained model.
  • the model update device determines whether the first model needs to be updated and trained through two trigger mechanisms, and determines whether It is necessary to replace the first model with the updated and trained model, which can trigger the model for automatic update training and automatic update deployment on demand, which can reduce the consumption of computing resources while improving the efficiency of model update.
  • FIG. 7 is a schematic structural diagram of a computing device 700 provided by the present application.
  • the computing device 700 includes: a processor 710 , a memory 720 and a communication interface 730 , wherein the processor 710 , the memory 720 , and the communication interface 730 They can be connected to each other through a bus 740 .
  • the processor 710 can read the program codes (including instructions) stored in the memory 720, and execute the program codes stored in the memory 720, so that the computing device 700 executes the steps in the model update method provided by the above method embodiments, or makes the computing device 700
  • the model updating apparatus 500A or 500B is deployed.
  • the processor 710 may have multiple specific implementation forms, such as a central processing unit (central processing unit, CPU), or a combination of a CPU and a hardware chip.
  • the aforementioned hardware chip may be an application-specific integrated circuit (application-specific integrated circuit, ASIC), a programmable logic device (programmable logic device, PLD) or a combination thereof.
  • ASIC application-specific integrated circuit
  • PLD programmable logic device
  • the above-mentioned PLD may be a complex programmable logic device (complex programmable logic device, CPLD), a field-programmable gate array (field-programmable gate array, FPGA), a general array logic (generic array logic, GAL) or any combination thereof.
  • Processor 710 executes various types of digitally stored instructions, such as software or firmware programs stored in memory 720, which enable computing device 700 to provide various services.
  • the memory 720 is used to store program codes, which are executed under the control of the processor 710, so as to execute the processing steps in any of the above-mentioned embodiments in FIG. 2 , FIG. 3 or FIG. 6 .
  • the program code may include one or more software modules.
  • the one or more software modules may be the software modules provided in the embodiment of FIG. Steps S201 to S207 in the embodiment of FIG. 2 will not be repeated here.
  • the one or more software modules may be the software modules provided in the embodiment of FIG. 5B, such as the storage unit 510, the management and control unit 520, the reasoning unit 530, the training unit 540, and the evaluation unit 550, which can be specifically used to execute the embodiment of FIG. 6 Steps S601 to S612 in Step S601 will not be repeated here.
  • the memory 720 may include a volatile memory (volatile memory), such as a random access memory (random access memory, RAM); the memory 720 may also include a non-volatile memory (non-volatile memory), such as a read-only memory (read-only memory). only memory, ROM), flash memory (flash memory), hard disk (hard disk drive, HDD) or solid-state drive (solid-state drive, SSD); the memory 720 may also include a combination of the above types.
  • volatile memory volatile memory
  • RAM random access memory
  • non-volatile memory such as a read-only memory (read-only memory). only memory, ROM), flash memory (flash memory), hard disk (hard disk drive, HDD) or solid-state drive (solid-state drive, SSD
  • ROM read-only memory
  • flash memory flash memory
  • HDD hard disk drive
  • SSD solid-state drive
  • the communication interface 730 can be a wired interface (such as an Ethernet interface, a fiber optic interface, other types of interfaces (such as an infiniBand interface)) or a wireless interface (such as a cellular network interface or using a wireless local area network interface) for communicating with other computing devices or devices. communication.
  • the communication interface 730 can adopt a protocol family above the transmission control protocol/internet protocol (transmission control protocol/internet protocol, TCP/IP), for example, a remote function call (remote function call, RFC) protocol, a simple object access protocol (simple object access protocol (SOAP) protocol, simple network management protocol (simple network management protocol, SNMP) protocol, common object request broker architecture (common object request broker architecture, CORBA) protocol and distributed protocols, etc.
  • TCP/IP transmission control protocol/internet protocol
  • RFC remote function call
  • SOAP simple object access protocol
  • simple network management protocol simple network management protocol
  • CORBA common object request broker architecture
  • the bus 740 can be a peripheral component interconnect express (PCIe) bus, or an extended industry standard architecture (EISA) bus, a unified bus (Ubus or UB), a computer fast link ( compute express link (CXL), cache coherent interconnect for accelerators (CCIX), etc.
  • PCIe peripheral component interconnect express
  • EISA extended industry standard architecture
  • Ubus or UB unified bus
  • CXL compute express link
  • CCIX cache coherent interconnect for accelerators
  • the bus 740 can be divided into an address bus, a data bus, a control bus, and the like.
  • the bus 740 may also include a power bus, a control bus, a status signal bus, and the like.
  • the various buses are labeled as bus 740 in the figure. For ease of representation, only one thick line is used in FIG. 7 , but it does not mean that there is only one bus or one type of bus.
  • the above-mentioned computing device 700 is used to execute the method executed in the above-mentioned embodiment of the model update method, which belongs to the same idea as the above-mentioned method embodiment, and its specific implementation process is detailed in the above-mentioned method embodiment, and will not be repeated here.
  • computing device 700 is only an example provided by the embodiment of the present application, and the computing device 700 may have more or fewer components than those shown in FIG. 7 , and two or more components may be combined, or It can be realized with different configurations of components.
  • the present application also provides a computer-readable storage medium, in which instructions are stored, and when the instructions are executed, some or all steps of the model updating method described in the above-mentioned embodiments can be implemented.
  • the present application also provides a computer program product.
  • the computer program product is read and executed by a computer, some or all steps of the model updating method described in the above method embodiments can be realized.
  • all or part may be implemented by software, hardware or any combination thereof.
  • software When implemented using software, it may be implemented in whole or in part in the form of a computer program product.
  • the computer program product includes one or more computer instructions. When the computer program instructions are loaded and executed on the computer, the processes or functions according to the embodiments of the present application will be generated in whole or in part.
  • the computer can be a general purpose computer, a special purpose computer, a computer network, or other programmable devices.
  • the computer instructions may be stored in or transmitted from one computer-readable storage medium to another computer-readable storage medium, for example, the computer instructions may be transmitted from a website, computer, server or data center Transmission to another website site, computer, server, or data center by wired (eg, coaxial cable, optical fiber, DSL) or wireless (eg, infrared, wireless, microwave, etc.) means.
  • the computer-readable storage medium may be any available medium that can be accessed by a computer, or a data storage device such as a server or a data center integrated with one or more available media.
  • the available medium may be a magnetic medium (such as a floppy disk, a hard disk, or a magnetic tape), an optical medium, or a semiconductor medium.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Software Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Computation (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Medical Informatics (AREA)
  • Computational Linguistics (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Computer Security & Cryptography (AREA)
  • Feedback Control In General (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

La présente demande concerne un procédé et un appareil de mise à jour de modèle et un dispositif associé, appliqués au domaine de l'intelligence artificielle (IA). Le procédé consiste : tout d'abord, à obtenir un ensemble d'échantillons de formation ; ensuite, lorsqu'un premier mécanisme de déclenchement est utilisé pour déterminer que la mise à jour et la formation d'un premier modèle est nécessaire, à mettre à jour et à former le premier modèle au moyen de l'ensemble d'échantillons de formation pour obtenir un modèle formé ; et, enfin, lorsqu'un second mécanisme de déclenchement est utilisé pour déterminer que le remplacement du premier modèle par le modèle mis à jour et formé est nécessaire, à remplacer le premier modèle par le modèle mis à jour et formé. Le procédé peut résoudre le problème que constitue la faible efficience de mise à jour de modèle dans l'état de la technique et, dans le même temps, réduire la consommation de ressources informatiques.
PCT/CN2022/131668 2021-11-30 2022-11-14 Procédé et appareil de mise à jour de modèle et dispositif associé WO2023098460A1 (fr)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202111443976.2A CN116205304A (zh) 2021-11-30 2021-11-30 一种模型更新方法、装置及相关设备
CN202111443976.2 2021-11-30

Publications (1)

Publication Number Publication Date
WO2023098460A1 true WO2023098460A1 (fr) 2023-06-08

Family

ID=86511648

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/131668 WO2023098460A1 (fr) 2021-11-30 2022-11-14 Procédé et appareil de mise à jour de modèle et dispositif associé

Country Status (2)

Country Link
CN (1) CN116205304A (fr)
WO (1) WO2023098460A1 (fr)

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110598842A (zh) * 2019-07-17 2019-12-20 深圳大学 一种深度神经网络超参数优化方法、电子设备及存储介质
US20200311541A1 (en) * 2019-03-28 2020-10-01 International Business Machines Corporation Metric value calculation for continuous learning system
US20210357805A1 (en) * 2020-05-15 2021-11-18 Vmware, Inc. Machine learning with an intelligent continuous learning service in a big data environment

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20200311541A1 (en) * 2019-03-28 2020-10-01 International Business Machines Corporation Metric value calculation for continuous learning system
CN110598842A (zh) * 2019-07-17 2019-12-20 深圳大学 一种深度神经网络超参数优化方法、电子设备及存储介质
US20210357805A1 (en) * 2020-05-15 2021-11-18 Vmware, Inc. Machine learning with an intelligent continuous learning service in a big data environment

Also Published As

Publication number Publication date
CN116205304A (zh) 2023-06-02

Similar Documents

Publication Publication Date Title
US11410046B2 (en) Learning-based service migration in mobile edge computing
US11153175B2 (en) Latency management by edge analytics in industrial production environments
CN111682954B (zh) 管理微服务的网络的方法、系统和计算机可读介质
US11108575B2 (en) Training models for IOT devices
US20190294975A1 (en) Predicting using digital twins
CN108809694B (zh) 业务编排方法、系统、装置与计算机可读存储介质
WO2021143155A1 (fr) Procédé et appareil d'apprentissage de modèle
WO2022028304A1 (fr) Procédé et appareil de traitement de données multimédia, dispositif et support d'enregistrement lisible
Chen et al. A survey on traffic prediction techniques using artificial intelligence for communication networks
US11902396B2 (en) Model tiering for IoT device clusters
US11412574B2 (en) Split predictions for IoT devices
CN111989696A (zh) 具有顺序学习任务的域中的可扩展持续学习的神经网络
WO2022088082A1 (fr) Procédé, appareil et dispositif de traitement de tâche basés sur une détection de défaut, et support de stockage
US11595269B1 (en) Identifying upgrades to an edge network by artificial intelligence
Raj et al. Edge/Fog Computing Paradigm: The Concept, Platforms and Applications.
US11711287B2 (en) Unified recommendation engine
US20230053575A1 (en) Partitioning and placement of models
CN115686846A (zh) 边缘计算中融合图神经网络和强化学习的容器集群在线部署方法
US20240095529A1 (en) Neural Network Optimization Method and Apparatus
Gilbert Artificial intelligence for autonomous networks
WO2023098460A1 (fr) Procédé et appareil de mise à jour de modèle et dispositif associé
WO2023056786A1 (fr) Suivi de poids d'atténuation dans des réseaux neuronaux à graphe
US11726862B1 (en) Over-the-air machine learning
Auger et al. Towards the internet of everything: Deployment scenarios for a QoO-aware integration platform
US20220245508A1 (en) Active learning using causal network feedback

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22900265

Country of ref document: EP

Kind code of ref document: A1