WO2021159749A1 - Self-learning online update method and system for multi-classification model, and apparatus - Google Patents

Self-learning online update method and system for multi-classification model, and apparatus Download PDF

Info

Publication number
WO2021159749A1
WO2021159749A1 PCT/CN2020/124883 CN2020124883W WO2021159749A1 WO 2021159749 A1 WO2021159749 A1 WO 2021159749A1 CN 2020124883 W CN2020124883 W CN 2020124883W WO 2021159749 A1 WO2021159749 A1 WO 2021159749A1
Authority
WO
WIPO (PCT)
Prior art keywords
updated
model
data
online
update
Prior art date
Application number
PCT/CN2020/124883
Other languages
French (fr)
Chinese (zh)
Inventor
李弦
阮晓雯
徐亮
Original Assignee
平安科技(深圳)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 平安科技(深圳)有限公司 filed Critical 平安科技(深圳)有限公司
Publication of WO2021159749A1 publication Critical patent/WO2021159749A1/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques

Definitions

  • This application relates to the field of artificial intelligence technology, and in particular to a method, system, device and storage medium for self-learning online updating of multiple classification models.
  • machine learning models are commonly used methods, such as multi-classification models, which are used to classify the data to be tested, realize data classification automation, and improve classification efficiency.
  • machine learning models especially multi-class models
  • its prediction performance mainly depends on the mining of training sample data. The stronger the training data sample's simulation of actual data, the stronger the prediction performance of the model.
  • the main problem is the lack of trigger mechanism for multi-class model update, selection of training data, and model update. And other specific technical solutions, so the automatic update of the multi-classification model cannot be realized.
  • This application provides a self-learning online update method, system, electronic device, and computer storage medium for a multi-classification model.
  • the main purpose of the method is to solve the problem that the existing multi-classification model has a significant decrease in prediction accuracy over time and cannot achieve automatic update. problem.
  • this application provides a self-learning online update method for a multi-classification model.
  • the method includes the following steps:
  • the prediction performance of the model to be updated is monitored and statistics, and the statistical results of the prediction performance in each statistical period are stored in the statistical database;
  • model to be updated needs to be updated online, acquiring newly generated data online, and updating the training data of the model to be updated according to the newly generated data;
  • the updated training data is used to update and train the model to be updated to obtain an updated multi-classification model.
  • this application also provides a self-learning online update system for multi-classification models, which includes:
  • the performance monitoring unit is used to monitor and count the predicted performance of the model to be updated according to the preset statistical period, and store the statistical results of the predicted performance in each statistical period into the statistical database;
  • the mechanism trigger unit is configured to use a preset trigger mechanism to check the data in the statistical database to determine whether the model to be updated needs to be updated online;
  • the data update unit is configured to, if the model to be updated needs to be updated online, obtain newly generated data online, and update the training data of the model to be updated according to the newly generated data;
  • the model update unit is used to update and train the model to be updated by using the updated training data to obtain an updated multi-classification model.
  • the present application also provides an electronic device, the electronic device comprising: a memory, a processor, and a multi-class model self-learning online update program stored in the memory and running on the processor , The method for realizing the self-learning online update of the multi-classification model when the multi-classification model self-learning online update program is executed by the processor;
  • the steps of the multi-classification model self-learning online update method include:
  • the prediction performance of the model to be updated is monitored and statistics, and the statistical results of the prediction performance in each statistical period are stored in the statistical database;
  • model to be updated needs to be updated online, acquiring newly generated data online, and updating the training data of the model to be updated according to the newly generated data;
  • the updated training data is used to update and train the model to be updated to obtain an updated multi-classification model.
  • this application also provides a computer-readable storage medium in which a multi-class model self-learning online update program is stored, and the multi-class model self-learning online update program is processed
  • a self-learning online update method for multi-classification models is realized when the device is executed;
  • the steps of the multi-classification model self-learning online update method include:
  • the prediction performance of the model to be updated is monitored and statistics, and the statistical results of the prediction performance in each statistical period are stored in the statistical database;
  • model to be updated needs to be updated online, acquiring newly generated data online, and updating the training data of the model to be updated according to the newly generated data;
  • the updated training data is used to update and train the model to be updated to obtain an updated multi-classification model.
  • the multi-classification model self-learning online update method, electronic device and computer readable storage medium proposed in this application can realize multi-classification by designing a set of multi-classification model update trigger mechanism, training data update mechanism and model update method.
  • the online automatic update of the model can also ensure that the prediction accuracy of the multi-class model has been maintained at a high level.
  • Fig. 1 is a flowchart of a preferred embodiment of a self-learning online update method for a multi-classification model according to an embodiment of the present application
  • FIG. 2 is a schematic structural diagram of a preferred embodiment of an electronic device according to an embodiment of the present application.
  • Fig. 3 is a schematic diagram of internal logic of a multi-class model self-learning online update program according to an embodiment of the present application.
  • FIG. 1 shows the flow of the multi-class model self-learning online update method provided in this application.
  • the self-learning online update method for multi-classification models includes:
  • S110 Perform monitoring and statistics on the prediction performance of the model to be updated according to the preset statistical period, and store the statistical results of the prediction performance in each statistical period in the statistical database.
  • this application uses the overall prediction precision value of the model to be updated as a statistical value to characterize the prediction performance of the model to be updated, where the precision value of the prediction accuracy is specific
  • the calculation formula is:
  • Prediction accuracy Precision value the number of samples correctly classified/the number of samples as a whole, where the number of samples correctly classified is the number of samples correctly classified by the model to be updated in the statistical period, and the total number of samples is the input in the statistical period To the total number of samples in the model to be updated.
  • the statistical period needs to be preset according to the volume of business data of the system. If there is a large amount of business data, you can set daily statistics (that is, 1 day is a statistical cycle). If the amount of data is small, you can set statistics on a weekly or monthly basis (that is, 1 week or 1 month is a statistics cycle). In practical applications, for official document classification scenarios, the prediction accuracy of the model is usually calculated with a weekly statistical cycle.
  • the amount of newly added data in the system is greater than 1,000, and it can be considered that the amount of data is large. If the amount of newly added data per day is greater than 1000, then the statistics will be calculated on a daily basis, if the accumulated data amount of the week is greater than 1000, then the statistics will be calculated on a weekly basis, and so on.
  • the official document classification scenario means that the staff assigns the official documents to each corresponding office or department in accordance with the content of the official document and the functions of each office or department within the organization. Department for processing. To put it simply, it is to classify official documents, and the label of the classification is the name of each office.
  • the number of official documents received by various agencies may be different each day, but the amount is generally relatively small.
  • the number of official documents that need to be distributed daily is about 200, so the statistics are calculated on a weekly basis.
  • the statistical database is stored in the nodes of the blockchain.
  • S120 Use a preset trigger mechanism to check the data in the statistical database to determine whether the model to be updated needs to be updated online.
  • the trigger mechanism is used to determine whether the model to be updated needs to be updated based on the data in the statistical database. Whether the model update needs to be triggered can be determined based on the historical prediction performance of the online model (ie the model to be updated) in the statistical database. .
  • the judgment condition for triggering the update according to the historical prediction performance of the online model can be: the online model is updated if any of the following conditions is met (this application corresponds to the processing of the subsequent steps, if the judgment condition is not met , Then continue to loop the above steps).
  • trigger mechanisms include:
  • Mechanism A If the prediction accuracy value of the historical N statistical periods including the current statistical period continues to decrease, it is determined that the model to be updated needs to be updated online, where N is the first preset parameter.
  • Mechanism B If the precision value of the prediction accuracy of the current statistical period is less than the average prediction accuracy-2*standard deviation of the prediction accuracy, or the precision value of the prediction accuracy of the current statistical period is less than the average prediction accuracy-decrease percentage P; then it is determined that the update is pending The model needs to be updated online; among them,
  • the prediction accuracy average value is the average value of the prediction accuracy Precision values of the historical N statistical periods
  • the prediction accuracy standard deviation is the standard deviation of the prediction accuracy Precision values of the historical N statistical periods
  • the decrease percentage P is the second
  • N and the percentage of decline P can be set according to business scenarios. In the document classification scenario, N is set to 5.
  • the above trigger mechanism is designed based on the actual situation and experience in the real scene. Following this rule to update the model can maintain a certain accuracy of the model, which will not change with time but the accuracy will decrease significantly.
  • model update is performed by setting the time period through business experience.
  • the online model Update is performed by setting the time period through business experience.
  • the trigger mechanism further includes: Mechanism C: Determine whether the online duration of the model to be updated reaches a preset update cycle threshold, and if it reaches, it is determined that the model to be updated needs to be updated online; wherein, the update The period threshold is M times the statistical period; where M is a natural number and ⁇ 2.
  • model update is the update of training data. Including the generated new data into the training data of the model is the preferred method for model update.
  • the update of training data includes the following steps:
  • the updating the training data of the model to be updated according to the newly generated data includes:
  • the various samples of the historical training data are processed in a circular manner through down-sampling and over-sampling until the newly generated data is consistent with the historical training data.
  • the proportion difference of various samples of the training data is less than the preset proportion threshold;
  • the training data there are two ways to update the training data, one is incremental, that is, the new data generated is directly added to the historical training data, and the full amount of historical training data is retained.
  • the other is the rolling fixed length, that is, the duration of the fixed training data.
  • the updated training data is the data 2 years before the current statistical period. Adding new data will remove the earliest data of the corresponding length in the training data.
  • the newly added data refers to the data accumulated from the last update to the current statistical period.
  • the second method is generally selected in scenes where data changes rapidly.
  • the updating the training data of the model to be updated according to the newly generated data includes:
  • the rolling fixed duration is L times the statistical period; where L is a natural number, and L>M. It should be noted that only when L>M, the generated training update data will include both historical training data and newly generated data, so as to ensure the fit of the training update data to the actual scene.
  • S140 Perform update training on the to-be-updated model using the training update training generated after the update, so as to obtain an updated multi-classification model.
  • the proportion of samples of various types refers to the different numbers of samples belonging to each type, such as:
  • Office 1 The number of official documents accounted for 20% of all official documents
  • the proportion of Office 1 changed from the dominant 20% to 9%, which caused a big change in the sample distribution.
  • the corresponding multi-classification model is updated according to the above model update process.
  • the weekly prediction accuracy remains at about 78% after the model is online.
  • the model accuracy Down to 70%.
  • the accuracy of the model can be maintained at 78%.
  • the self-learning online update method for multi-classification models can realize multi-classification models by self-designing a set of multi-classification model update trigger mechanism, training data update mechanism and model update method.
  • the online automatic update can ensure that the prediction accuracy of the multi-classification model has been maintained at a high level.
  • the prediction performance of the online multi-class model is automatically reflected through model accuracy tracking, and the corresponding model update trigger mechanism is set to provide a criterion for model performance degradation, which can effectively find the update time of the multi-class model. Prevent the prediction model from falling.
  • the updated model can be adapted to adapt to changes in the online running data distribution; and the setting of the model update conditions makes the model's prediction accuracy Stable in a certain range, so as to significantly improve the adaptive ability of the model under the premise of ensuring the stability of the model's prediction accuracy.
  • the multi-class model self-learning online update solution provided by the present application can also effectively avoid the complicated work of manually updating the model, and can respond in real time to ensure the performance of the prediction model.
  • this application also provides a multi-classification model self-learning online update system, which includes:
  • the performance monitoring unit is used to monitor and count the predicted performance of the model to be updated according to the preset statistical period, and store the statistical results of the predicted performance in each statistical period into the statistical database;
  • the mechanism trigger unit is configured to use a preset trigger mechanism to check the data in the statistical database to determine whether the model to be updated needs to be updated online;
  • the data update unit is configured to, if the model to be updated needs to be updated online, obtain newly generated data online, and update the training data of the model to be updated according to the newly generated data;
  • the model update unit is used to update and train the model to be updated by using the updated training data to obtain an updated multi-classification model.
  • the application also provides an electronic device 70.
  • FIG. 2 this figure is a schematic structural diagram of a preferred embodiment of the electronic device 70 provided by this application.
  • the electronic device 70 may be a terminal device with a computing function, such as a server, a smart phone, a tablet computer, a portable computer, a desktop computer, and the like.
  • a computing function such as a server, a smart phone, a tablet computer, a portable computer, a desktop computer, and the like.
  • the electronic device 70 includes a processor 71 and a memory 72.
  • the memory 72 includes at least one type of readable storage medium.
  • the at least one type of readable storage medium may be a non-volatile storage medium such as flash memory, hard disk, multimedia card, card-type memory, and the like.
  • the readable storage medium may be an internal storage unit of the electronic device 70, such as a hard disk of the electronic device 70.
  • the readable storage medium may also be an external memory of the electronic device 1, such as a plug-in hard disk equipped on the electronic device 70, a smart memory card (Smart Media Card, SMC), or a secure digital (Secure Digital). Digital, SD) card, flash card, etc.
  • the readable storage medium of the memory 72 is generally used to store the multi-class model self-learning online update program 73 installed in the electronic device 70.
  • the memory 72 can also be used to temporarily store data that has been output or will be output.
  • the processor 72 may be a central processing unit (Central Processing Unit, CPU), microprocessor or other data processing chip in some embodiments, for running the program code or processing data stored in the memory 72, for example, the multi-classification model self Learn the online update program 73 and more.
  • CPU Central Processing Unit
  • microprocessor or other data processing chip in some embodiments, for running the program code or processing data stored in the memory 72, for example, the multi-classification model self Learn the online update program 73 and more.
  • the electronic device 70 is a terminal device such as a smart phone, a tablet computer, and a portable computer. In other embodiments, the electronic device 70 may be a server.
  • FIG. 2 only shows the electronic device 70 with the components 71-73, but it should be understood that it is not required to implement all the illustrated components, and more or fewer components may be implemented instead.
  • the electronic device 70 may also include a user interface.
  • the user interface may include an input unit such as a keyboard (Keyboard), a voice input device such as a microphone (microphone) and other devices with voice recognition functions, and a voice output device such as audio, earphones, etc.
  • the user interface may also include a standard wired interface and a wireless interface.
  • the electronic device 70 may further include a display, and the display may also be referred to as a display screen or a display unit.
  • the display may be an LED display, a liquid crystal display, a touch-sensitive liquid crystal display, and an organic light-emitting diode (Organic Light Emitting Diode). Light-Emitting Diode, OLED) touch device, etc.
  • the display is used for displaying information processed in the electronic device 70 and for displaying a visualized user interface.
  • the electronic device 70 may also include a touch sensor.
  • the area provided by the touch sensor for the user to perform touch operations is called the touch area.
  • the touch sensor here may be a resistive touch sensor, a capacitive touch sensor, or the like.
  • the touch sensor includes not only a contact type touch sensor, but also a proximity type touch sensor and the like.
  • the touch sensor may be a single sensor, or may be, for example, a plurality of sensors arranged in an array.
  • the area of the display of the electronic device 70 may be the same as or different from the area of the touch sensor.
  • the display and the touch sensor are stacked to form a touch display screen. The device detects the touch operation triggered by the user based on the touch screen.
  • the electronic device 70 may also include a radio frequency (RF) circuit, a sensor, an audio circuit, etc., which will not be repeated here.
  • RF radio frequency
  • the memory 72 as a computer storage medium may include an operating system, and a multi-class model self-learning online update program 73; the processor 71 executes the multi-class model self-learning stored in the memory 72 The following steps are implemented when updating program 73 online:
  • the prediction performance of the model to be updated is monitored and statistics, and the statistical results of the prediction performance in each statistical period are stored in the statistical database;
  • model to be updated needs to be updated online, acquiring newly generated data online, and updating the training data of the model to be updated according to the newly generated data;
  • the updated training data is used to update and train the model to be updated to obtain an updated multi-classification model.
  • FIG. 3 is a schematic diagram of the internal logic of the multi-class model self-learning online update program according to an embodiment of the present application.
  • the multi-class model self-learning online update program 73 can also be divided into one or Multiple modules, one or more modules are stored in the memory 72 and executed by the processor 71 to complete the application.
  • the module referred to in this application refers to a series of computer program instruction segments that can complete specific functions.
  • FIG. 3 it is a program module diagram of a preferred embodiment of the multi-class model self-learning online update program 73 in FIG. 2.
  • the multi-class model self-learning online update program 73 can be divided into: a performance monitoring module 74, a mechanism triggering module 75, a data update module 76, and a model update module 77.
  • the functions or operation steps implemented by modules 74-77 are similar to the above, and will not be described in detail here. Illustratively, for example, where:
  • the performance monitoring module 74 is used to monitor and count the predicted performance of the model to be updated according to the preset statistical period, and store the statistical results of the predicted performance in each statistical period into the statistical database;
  • the mechanism trigger module 75 is configured to use a preset trigger mechanism to check the data in the statistical database to determine whether the model to be updated needs to be updated online;
  • the data update module 76 is configured to, if the model to be updated needs to be updated online, obtain newly generated data online, and update the training data of the model to be updated according to the newly generated data;
  • the model update module 77 is configured to update and train the model to be updated by using the updated training data to obtain an updated multi-classification model.
  • the computer-readable storage medium may be non-volatile or volatile.
  • the computer-readable storage medium stores a multi-classification model self-learning online update program 73, When the multi-class model self-learning online update program 73 is executed by the processor, the following operations are implemented:
  • the prediction performance of the model to be updated is monitored and statistics, and the statistical results of the prediction performance in each statistical period are stored in the statistical database;
  • model to be updated needs to be updated online, acquiring newly generated data online, and updating the training data of the model to be updated according to the newly generated data;
  • the updated training data is used to update and train the model to be updated to obtain an updated multi-classification model.
  • Blockchain essentially a decentralized database, is a series of data blocks associated with cryptographic methods. Each data block contains a batch of network transaction information for verification. The validity of the information (anti-counterfeiting) and the generation of the next block.
  • the blockchain can include the underlying platform of the blockchain, the platform product service layer, and the application service layer.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • General Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • Software Systems (AREA)
  • Physics & Mathematics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • General Engineering & Computer Science (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Medical Informatics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Biology (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

Disclosed is a self-learning online update method for a multi-classification model, relating to artificial intelligence. The method comprises: according to a preset statistical period, performing monitoring and compiling statistics on the prediction performance of a model to be updated, and storing, in a statistical database, a predication performance statistical result in each statistical period (S110); checking data in the statistical database by using a preset trigger mechanism, so as to determine whether said model needs to be updated online (S120); if said model needs to be updated online, acquiring online newly generated data, and updating training data of said model according to the newly generated data (S130); and updating and training said model by using the updated training data, so as to obtain an updated multi-classification model (S140). The present application further relates to blockchain technology. The statistical database is stored in a blockchain. The existing problems of the prediction precision of a multi-classification model being significantly reduced as time goes by, and the multi-classification model being unable to be automatically updated can be solved.

Description

多分类模型自学习在线更新方法、系统及装置Multi-classification model self-learning online update method, system and device
本申请要求于2020年09月04日提交中国专利局、申请号为2020109227529,发明名称为“多分类模型自学习在线更新方法、系统及装置”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。This application claims the priority of a Chinese patent application filed with the Chinese Patent Office on September 4, 2020, the application number is 2020109227529, and the invention title is "Multi-class model self-learning online update method, system and device", the entire content of which is by reference Incorporated in this application.
技术领域Technical field
本申请涉及人工智能技术领域,尤其涉及一种多分类模型自学习在线更新方法、系统、装置及存储介质。This application relates to the field of artificial intelligence technology, and in particular to a method, system, device and storage medium for self-learning online updating of multiple classification models.
背景技术Background technique
在人工智能技术领域,机器学习模型是常见使用手段,比如多分类模型,用于对待测数据进行分类,实现数据分类自动化,提高分类效率。然而,对于机器学习模型(尤其是多分类模型),其预测性能主要取决于对训练样本数据的挖掘,训练数据样本对实际数据的模拟性越强,则模型的预测性能越强。In the field of artificial intelligence technology, machine learning models are commonly used methods, such as multi-classification models, which are used to classify the data to be tested, realize data classification automation, and improve classification efficiency. However, for machine learning models (especially multi-class models), its prediction performance mainly depends on the mining of training sample data. The stronger the training data sample's simulation of actual data, the stronger the prediction performance of the model.
然而,当训练好的模型部署上线后,如果线上待预测数据分布或模式随时间变化,出现较多训练数据未覆盖的模式时,模型的预测精度将大幅下降。如政府公文分类模型,待预测的公文内容会随当前时事或政策变化。因此,需要利用新获取的标注数据对受时效影响的模型进行更新。如果采用人工进行更新,则需要技术人员跟踪模型性能,持续反复训练模型并进行部署上线,势必会耗费较大人力。However, after the trained model is deployed and online, if the distribution or pattern of the data to be predicted on the line changes over time, and there are more patterns that are not covered by the training data, the prediction accuracy of the model will drop significantly. Such as the classification model of government official documents, the content of official documents to be predicted will change with current events or policies. Therefore, it is necessary to use the newly acquired annotation data to update the model affected by timeliness. If manual updates are used, technicians are required to track the performance of the model, and continue to repeatedly train the model and deploy it online, which will inevitably consume a lot of manpower.
发明人发现目前现有的机器学习模型自动更新方法较少,尤其是对于多分类模型,还不能实现其自动更新,主要问题为缺少对多分类模型更新的触发机制、训练数据的选择以及模型更新等具体技术方案的设定,因此不能实现多分类模型的自动更新。The inventor found that there are currently few automatic update methods for machine learning models, especially for multi-class models, which cannot be automatically updated. The main problem is the lack of trigger mechanism for multi-class model update, selection of training data, and model update. And other specific technical solutions, so the automatic update of the multi-classification model cannot be realized.
基于以上问题,亟需一种能够实现多分类模型自动更新的方法。Based on the above problems, there is an urgent need for a method that can realize automatic updating of multi-classification models.
技术问题technical problem
本申请提供一种多分类模型自学习在线更新方法、系统、电子装置以及计算机存储介质,其主要目的在于解决现有的多分类模型随时时间的推移,预测精度显著降低,且不能实现自动更新的问题。This application provides a self-learning online update method, system, electronic device, and computer storage medium for a multi-classification model. The main purpose of the method is to solve the problem that the existing multi-classification model has a significant decrease in prediction accuracy over time and cannot achieve automatic update. problem.
技术解决方案Technical solutions
为实现上述目的,本申请提供一种多分类模型自学习在线更新方法,该方法包括如下步骤:In order to achieve the above objective, this application provides a self-learning online update method for a multi-classification model. The method includes the following steps:
根据预设统计周期对待更新模型的预测性能进行监测统计,并将各统计周期内的预测性能统计结果存入统计数据库;According to the preset statistical period, the prediction performance of the model to be updated is monitored and statistics, and the statistical results of the prediction performance in each statistical period are stored in the statistical database;
使用预设的触发机制对所述统计数据库内的数据进行检查,以判断所述待更新模型是否需要进行在线更新;Use a preset trigger mechanism to check the data in the statistical database to determine whether the model to be updated needs to be updated online;
若所述待更新模型需要进行在线更新,则获取线上新产生数据,并根据所述新产生数据对待更新模型的训练数据进行更新;If the model to be updated needs to be updated online, acquiring newly generated data online, and updating the training data of the model to be updated according to the newly generated data;
使用更新后训练数据对所述待更新模型进行更新训练,以获得更新后的多分类模型。The updated training data is used to update and train the model to be updated to obtain an updated multi-classification model.
另外,本申请还提供一种多分类模型自学习在线更新系统,所述系统包括:In addition, this application also provides a self-learning online update system for multi-classification models, which includes:
性能监测单元,用于根据预设统计周期对待更新模型的预测性能进行监测统计,并将各统计周期内的预测性能统计结果存入统计数据库;The performance monitoring unit is used to monitor and count the predicted performance of the model to be updated according to the preset statistical period, and store the statistical results of the predicted performance in each statistical period into the statistical database;
机制触发单元,用于使用预设的触发机制对所述统计数据库内的数据进行检查,以判断所述待更新模型是否需要进行在线更新;The mechanism trigger unit is configured to use a preset trigger mechanism to check the data in the statistical database to determine whether the model to be updated needs to be updated online;
数据更新单元,用于若所述待更新模型需要进行在线更新,则获取线上新产生数据,并根据所述新产生数据对待更新模型的训练数据进行更新;The data update unit is configured to, if the model to be updated needs to be updated online, obtain newly generated data online, and update the training data of the model to be updated according to the newly generated data;
模型更新单元,用于使用更新后训练数据对所述待更新模型进行更新训练,以获得更新后的多分类模型。The model update unit is used to update and train the model to be updated by using the updated training data to obtain an updated multi-classification model.
另外,为实现上述目的,本申请还提供一种电子装置,该电子装置包括:存储器、处理器以及存储在所述存储器中并可在所述处理器上运行的多分类模型自学习在线更新程序,所述多分类模型自学习在线更新程序被所述处理器执行时实现多分类模型自学习在线更新方法;In addition, in order to achieve the above object, the present application also provides an electronic device, the electronic device comprising: a memory, a processor, and a multi-class model self-learning online update program stored in the memory and running on the processor , The method for realizing the self-learning online update of the multi-classification model when the multi-classification model self-learning online update program is executed by the processor;
其中,所述多分类模型自学习在线更新方法的步骤包括:Wherein, the steps of the multi-classification model self-learning online update method include:
根据预设统计周期对待更新模型的预测性能进行监测统计,并将各统计周期内的预测性能统计结果存入统计数据库;According to the preset statistical period, the prediction performance of the model to be updated is monitored and statistics, and the statistical results of the prediction performance in each statistical period are stored in the statistical database;
使用预设的触发机制对所述统计数据库内的数据进行检查,以判断所述待更新模型是否需要进行在线更新;Use a preset trigger mechanism to check the data in the statistical database to determine whether the model to be updated needs to be updated online;
若所述待更新模型需要进行在线更新,则获取线上新产生数据,并根据所述新产生数据对待更新模型的训练数据进行更新;If the model to be updated needs to be updated online, acquiring newly generated data online, and updating the training data of the model to be updated according to the newly generated data;
使用更新后训练数据对所述待更新模型进行更新训练,以获得更新后的多分类模型。The updated training data is used to update and train the model to be updated to obtain an updated multi-classification model.
另外,为实现上述目的,本申请还提供一种计算机可读存储介质,所述计算机可读存储介质中存储有多分类模型自学习在线更新程序,所述多分类模型自学习在线更新程序被处理器执行时实现多分类模型自学习在线更新方法;In addition, in order to achieve the above object, this application also provides a computer-readable storage medium in which a multi-class model self-learning online update program is stored, and the multi-class model self-learning online update program is processed A self-learning online update method for multi-classification models is realized when the device is executed;
其中,所述多分类模型自学习在线更新方法的步骤包括:Wherein, the steps of the multi-classification model self-learning online update method include:
根据预设统计周期对待更新模型的预测性能进行监测统计,并将各统计周期内的预测性能统计结果存入统计数据库;According to the preset statistical period, the prediction performance of the model to be updated is monitored and statistics, and the statistical results of the prediction performance in each statistical period are stored in the statistical database;
使用预设的触发机制对所述统计数据库内的数据进行检查,以判断所述待更新模型是否需要进行在线更新;Use a preset trigger mechanism to check the data in the statistical database to determine whether the model to be updated needs to be updated online;
若所述待更新模型需要进行在线更新,则获取线上新产生数据,并根据所述新产生数据对待更新模型的训练数据进行更新;If the model to be updated needs to be updated online, acquiring newly generated data online, and updating the training data of the model to be updated according to the newly generated data;
使用更新后训练数据对所述待更新模型进行更新训练,以获得更新后的多分类模型。The updated training data is used to update and train the model to be updated to obtain an updated multi-classification model.
有益效果Beneficial effect
本申请提出的多分类模型自学习在线更新方法、电子装置及计算机可读存储介质,通过自行设计一套多分类模型更新的触发机制、训练数据的更新机制以及模型更新方法,既能够实现多分类模型的线上自动更新,又能够确保多分类模型的预测精度一直保持在较高状态。The multi-classification model self-learning online update method, electronic device and computer readable storage medium proposed in this application can realize multi-classification by designing a set of multi-classification model update trigger mechanism, training data update mechanism and model update method. The online automatic update of the model can also ensure that the prediction accuracy of the multi-class model has been maintained at a high level.
附图说明Description of the drawings
图1为根据本申请实施例的多分类模型自学习在线更新方法的较佳实施例流程图;Fig. 1 is a flowchart of a preferred embodiment of a self-learning online update method for a multi-classification model according to an embodiment of the present application;
图2为根据本申请实施例的电子装置的较佳实施例结构示意图;2 is a schematic structural diagram of a preferred embodiment of an electronic device according to an embodiment of the present application;
图3为根据本申请实施例的多分类模型自学习在线更新程序的内部逻辑示意图。Fig. 3 is a schematic diagram of internal logic of a multi-class model self-learning online update program according to an embodiment of the present application.
本申请目的的实现、功能特点及优点将结合实施例,参照附图做进一步说明。The realization, functional characteristics, and advantages of the purpose of this application will be further described in conjunction with the embodiments and with reference to the accompanying drawings.
本发明的最佳实施方式The best mode of the present invention
在下面的描述中,出于说明的目的,为了提供对一个或多个实施例的全面理解,阐述了许多具体细节。然而,很明显,也可以在没有这些具体细节的情况下实现这些实施例。In the following description, for illustrative purposes, in order to provide a comprehensive understanding of one or more embodiments, many specific details are set forth. However, it is obvious that these embodiments can also be implemented without these specific details.
以下将结合附图对本申请的具体实施例进行详细描述。The specific embodiments of the present application will be described in detail below with reference to the accompanying drawings.
实施例1Example 1
为了说明本申请提供的多分类模型自学习在线更新方法,图1示出了根据本申请提供的多分类模型自学习在线更新方法的流程。In order to illustrate the multi-class model self-learning online update method provided in this application, FIG. 1 shows the flow of the multi-class model self-learning online update method provided in this application.
如图1所示,本申请提供的多分类模型自学习在线更新方法,包括:As shown in Figure 1, the self-learning online update method for multi-classification models provided by this application includes:
S110:根据预设统计周期对待更新模型的预测性能进行监测统计,并将各统计周期内的预测性能统计结果存入统计数据库。S110: Perform monitoring and statistics on the prediction performance of the model to be updated according to the preset statistical period, and store the statistical results of the prediction performance in each statistical period in the statistical database.
需要说明的是,为较为准确地反映待更新模型的预测性能,本申请使用待更新模型整体的预测精度precision值作为统计值,来表征待更新模型的预测性能,其中,预测精度Precision 值的具体计算公式为:It should be noted that, in order to more accurately reflect the prediction performance of the model to be updated, this application uses the overall prediction precision value of the model to be updated as a statistical value to characterize the prediction performance of the model to be updated, where the precision value of the prediction accuracy is specific The calculation formula is:
预测精度Precision 值=正确分类的样本数/整体的样本数,其中,正确分类的样本数即为该统计周期内该待更新模型分类正确的样本数,整体的样本数即为该统计周期内输入至该待更新模型内的样本总数。Prediction accuracy Precision value = the number of samples correctly classified/the number of samples as a whole, where the number of samples correctly classified is the number of samples correctly classified by the model to be updated in the statistical period, and the total number of samples is the input in the statistical period To the total number of samples in the model to be updated.
需要进一步说明的是,统计周期需要根据系统的业务数据量进行预先设定。如业务数据量较多,则可设置每日进行统计(即1天为一个统计周期),如数据量较少,则可设置按周或月进行统计(即1周或1个月为一个统计周期)。在实际应用中,对于公文分类场景,通常以周为统计周期对模型预测精度进行统计。It should be further explained that the statistical period needs to be preset according to the volume of business data of the system. If there is a large amount of business data, you can set daily statistics (that is, 1 day is a statistical cycle). If the amount of data is small, you can set statistics on a weekly or monthly basis (that is, 1 week or 1 month is a statistics cycle). In practical applications, for official document classification scenarios, the prediction accuracy of the model is usually calculated with a weekly statistical cycle.
以公文分类场景为例,一般认定系统新增数据量大于1000条,则可认为数据量较多。如每日新增数据量大于1000,则按日统计,如周累计的数据量大于1000,则按周统计,以此类推。Taking the scenario of document classification as an example, it is generally assumed that the amount of newly added data in the system is greater than 1,000, and it can be considered that the amount of data is large. If the amount of newly added data per day is greater than 1000, then the statistics will be calculated on a daily basis, if the accumulated data amount of the week is greater than 1000, then the statistics will be calculated on a weekly basis, and so on.
此处需要说明的是,公文分类场景,是指工作人员把单位每日接收到的公文文件,按照公文的内容以及单位内部各个处室或部门的职能,将公文分派给各个相应的处室或部门进行处理。简化来说,就是将公文进行分类,分类的标签就是各个处室名。各个机关单位每日接收的公文数可能不同,但一般量相对较少,每日需分派的公文文件数大概在200左右,因此,以周进行统计。What needs to be explained here is that the official document classification scenario means that the staff assigns the official documents to each corresponding office or department in accordance with the content of the official document and the functions of each office or department within the organization. Department for processing. To put it simply, it is to classify official documents, and the label of the classification is the name of each office. The number of official documents received by various agencies may be different each day, but the amount is generally relatively small. The number of official documents that need to be distributed daily is about 200, so the statistics are calculated on a weekly basis.
另外,需要强调的是,为进一步保证上述统计数据库中数据的私密和安全性,统计数据库存储在区块链的节点中。In addition, it should be emphasized that, in order to further ensure the privacy and security of the data in the above statistical database, the statistical database is stored in the nodes of the blockchain.
S120:使用预设的触发机制对所述统计数据库内的数据进行检查,以判断所述待更新模型是否需要进行在线更新。S120: Use a preset trigger mechanism to check the data in the statistical database to determine whether the model to be updated needs to be updated online.
需要说明的是,触发机制用于根据统计数据库内的数据对待更新模型进行是否需要更新的判断,具体是否需要触发模型更新可根据线上模型(即待更新模型)统计数据库内的历史预测性能确定。It should be noted that the trigger mechanism is used to determine whether the model to be updated needs to be updated based on the data in the statistical database. Whether the model update needs to be triggered can be determined based on the historical prediction performance of the online model (ie the model to be updated) in the statistical database. .
具体地,根据线上模型的历史预测性能触发更新的判断条件可以为:满足以下任一条件则对线上模型进行更新(本申请此处即对应进行后续步骤的处理,若为未达到判断条件,则继续循环上述步骤)。Specifically, the judgment condition for triggering the update according to the historical prediction performance of the online model can be: the online model is updated if any of the following conditions is met (this application corresponds to the processing of the subsequent steps, if the judgment condition is not met , Then continue to loop the above steps).
例如触发机制包括:For example, trigger mechanisms include:
机制A:若包含当前统计周期在内的历史N个统计周期的预测精度precision值持续下降,则判定所述待更新模型需要进行在线更新,其中,N为第一预设参数。Mechanism A: If the prediction accuracy value of the historical N statistical periods including the current statistical period continues to decrease, it is determined that the model to be updated needs to be updated online, where N is the first preset parameter.
机制B:若当前统计周期的预测精度Precision 值小于预测精度平均值-2*预测精度标准差,或当前统计周期的预测精度Precision 值小于预测精度平均值-下降百分比P;则判定所述待更新模型需要进行在线更新;其中,Mechanism B: If the precision value of the prediction accuracy of the current statistical period is less than the average prediction accuracy-2*standard deviation of the prediction accuracy, or the precision value of the prediction accuracy of the current statistical period is less than the average prediction accuracy-decrease percentage P; then it is determined that the update is pending The model needs to be updated online; among them,
所述预测精度平均值为历史N个统计周期的预测精度Precision 值的平均值,所述预测精度标准差为历史N个统计周期的预测精度Precision 值的标准差,所述下降百分比P为第二预测参数,N和下降百分比P可根据业务场景进行设置。在公文分类场景中,N设置为5。The prediction accuracy average value is the average value of the prediction accuracy Precision values of the historical N statistical periods, the prediction accuracy standard deviation is the standard deviation of the prediction accuracy Precision values of the historical N statistical periods, and the decrease percentage P is the second The prediction parameters, N and the percentage of decline P can be set according to business scenarios. In the document classification scenario, N is set to 5.
需要说明的是,上述触发机制是根据现实场景中的实际情况以及经验设计出的,遵循这种规则对模型进行更新,能够使得模型保持一定的精度,不至于随时间变化而是精度明显下降。It should be noted that the above trigger mechanism is designed based on the actual situation and experience in the real scene. Following this rule to update the model can maintain a certain accuracy of the model, which will not change with time but the accuracy will decrease significantly.
此处需要进一步说明的是,这里的业务经验,是指各个业务场景下所能容忍的精度变化。如有的业务场景要求较高,精度从90%下降到89%都不允许,则精度下降超过P= 1%就需要更新模型。而有的场景要求没有那么高,可能P=10%才更新模型。在公文场景中,一般与前N=5个统计周期的值进行比较,P=5%,超过5%则进行更新。What needs to be further explained here is that the business experience here refers to the accuracy changes that can be tolerated in each business scenario. If some business scenarios require high requirements, and the accuracy is not allowed to drop from 90% to 89%, the model needs to be updated if the accuracy drops more than P = 1%. In some scenarios, the requirements are not so high, maybe P=10% to update the model. In the official document scenario, generally compare with the value of the previous N=5 statistical periods, P=5%, and update if it exceeds 5%.
此外,还可以使用其它机制启动模型的更新(即本申请的后续步骤);例如,通过业务经验设置时间周期进行模型更新,当统计数据库的统计周期达到设置的时间周期数时,对线上模型进行更新。In addition, other mechanisms can also be used to initiate the model update (that is, the subsequent steps of this application); for example, the model update is performed by setting the time period through business experience. When the statistical period of the statistical database reaches the set number of time periods, the online model Update.
例如,所述触发机制还包括:机制C:判定所述待更新模型的上线时长是否达到预设的更新周期阈值,若达到,则判定所述待更新模型需要进行在线更新;其中,所述更新周期阈值为所述统计周期的M倍;其中,M为自然数,且≥2。For example, the trigger mechanism further includes: Mechanism C: Determine whether the online duration of the model to be updated reaches a preset update cycle threshold, and if it reaches, it is determined that the model to be updated needs to be updated online; wherein, the update The period threshold is M times the statistical period; where M is a natural number and ≥2.
S130:若所述待更新模型需要进行在线更新,则获取线上新产生数据,并根据所述新产生数据对待更新模型的训练数据进行更新。S130: If the model to be updated needs to be updated online, obtain newly generated data online, and update the training data of the model to be updated according to the newly generated data.
需要说明的是,模型更新最主要的就是训练数据的更新,将产生的新数据包含到模型的训练数据中,是模型更新的优选方式。It should be noted that the most important aspect of model update is the update of training data. Including the generated new data into the training data of the model is the preferred method for model update.
具体地,训练数据的更新包括以下步骤:Specifically, the update of training data includes the following steps:
检验线上新产生标注数据与模型的历史训练数据的分布是否一致,若一致,则将所述新产生数据与所述历史训练数据进行合并,生成训练更新数据。Check whether the distribution of the newly generated annotation data on the line is consistent with the historical training data of the model. If they are consistent, the newly generated data and the historical training data are merged to generate training update data.
需要说明的是,由于多分类模型的预测精度极易受数据分布影响,因此需要对当前线上数据的分布与历史训练数据的分布做一致性检验。若各类样本占比接近,则可直接将历史训练数据与新数据进行合并。否则,若线上数据分布出现变化,即与历史训练数据出现较大差别,如某一类样本的占比发生较大变化,则需要对照线上数据情况通过下采样和过采样对历史训练数据的样本比例进行调节,之后再进行合并。It should be noted that, since the prediction accuracy of the multi-class model is extremely susceptible to the data distribution, it is necessary to check the consistency between the current online data distribution and the historical training data distribution. If the proportions of various samples are close, the historical training data can be directly merged with the new data. Otherwise, if there is a change in the online data distribution, that is, there is a big difference from the historical training data. If the proportion of a certain type of sample changes greatly, you need to compare the online data situation to the historical training data through downsampling and oversampling. Adjust the sample ratio of, and then merge.
具体地,若所述触发机制为所述机制A或所述机制B,则所述根据新产生数据对待更新模型的训练数据进行更新包括:Specifically, if the trigger mechanism is the mechanism A or the mechanism B, the updating the training data of the model to be updated according to the newly generated data includes:
增量式更新:Incremental update:
检验所述新产生数据与所述待更新模型的历史训练数据的分布是否一致;Checking whether the distribution of the newly generated data is consistent with the historical training data of the model to be updated;
若一致,则将所述新产生数据与所述历史训练数据进行合并,生成训练更新数据;其中,If they are consistent, merge the newly generated data with the historical training data to generate training update data; wherein,
若所述新产生数据与所述历史训练数据的各类样本的占比差均小于预设占比阈值,则判定所述新产生数据与所述历史训练数据的分布一致。If the proportion difference between the various samples of the newly generated data and the historical training data is less than the preset proportion threshold, it is determined that the distribution of the newly generated data is consistent with the historical training data.
若所述新产生数据与所述历史训练数据的分布不一致,则通过下采样和过采样的方式对所述历史训练数据的各类样本进行循环式处理,直至所述新产生数据与所述历史训练数据的各类样本的占比差均小于预设占比阈值;If the distribution of the newly generated data is inconsistent with the historical training data, the various samples of the historical training data are processed in a circular manner through down-sampling and over-sampling until the newly generated data is consistent with the historical training data. The proportion difference of various samples of the training data is less than the preset proportion threshold;
将所述新产生数据与循环处理后的历史训练数据进行合并,生成所述训练更新数据。Combining the newly generated data with the historical training data after cyclic processing to generate the training update data.
需要说明的是,训练数据的更新有两种方式,一种是增量式,即直接将产生的新数据加入历史训练数据,历史训练数据全量保留。另一种是滚动固定长度,即固定训练数据的时长,例如更新的训练数据为当前统计周期前2年的数据。加入新的数据,则去除训练数据中最早的对应长度的数据。加入的新标注数据指上一次更新起始的数据至当前统计周期积累的数据。在数据变化较快的场景一般选择第二种方式。It should be noted that there are two ways to update the training data, one is incremental, that is, the new data generated is directly added to the historical training data, and the full amount of historical training data is retained. The other is the rolling fixed length, that is, the duration of the fixed training data. For example, the updated training data is the data 2 years before the current statistical period. Adding new data will remove the earliest data of the corresponding length in the training data. The newly added data refers to the data accumulated from the last update to the current statistical period. The second method is generally selected in scenes where data changes rapidly.
具体地,若所述触发机制为所述机制C,则所述根据新产生数据对待更新模型的训练数据进行更新包括:Specifically, if the trigger mechanism is the mechanism C, the updating the training data of the model to be updated according to the newly generated data includes:
滚动式更新:Rolling update:
设定滚动固定时长,并根据所述滚动固定时长自所述新产生数据与所述历史训练数据中选取相应的数据的作为所述训练更新数据;其中,Set a fixed rolling duration, and select corresponding data from the newly generated data and the historical training data according to the fixed rolling duration as the training update data; wherein,
所述滚动固定时长为所述统计周期的L倍;其中,L为自然数,且L>M。需要说明的是,只有当L>M时,生成的所述训练更新数据才会既包含历史训练数据又包含新产生数据,从而确保训练更新数据对实际场景的拟合度。The rolling fixed duration is L times the statistical period; where L is a natural number, and L>M. It should be noted that only when L>M, the generated training update data will include both historical training data and newly generated data, so as to ensure the fit of the training update data to the actual scene.
S140:使用更新后产生的训练更新训练对所述待更新模型进行更新训练,以获得更新后的多分类模型。S140: Perform update training on the to-be-updated model using the training update training generated after the update, so as to obtain an updated multi-classification model.
在实际使用过程中,在新的统计周期内,评估比较新的训练模型与旧模型的预测性能,以决定是否利用新模型替换旧模型,若新模型的预测精度大于旧模型,则将预测模型进行替换,否则不替换模型。In actual use, in the new statistical period, evaluate and compare the prediction performance of the new training model and the old model to determine whether to replace the old model with the new model. If the prediction accuracy of the new model is greater than the old model, the prediction model will be used Replace, otherwise the model will not be replaced.
下面以公文类数据为例,对本申请提供的多分类模型自学习在线更新方法的过程进行详细说明,例如,取公文标题:国务院办公厅关于印发深化医药卫生体制改革2018年下半年重点工作任务的通知,取分类标签:社会发展处(处室名),其中,标注数据和历史训练数据都是以上展示的形式,只不过新的标注数据是最近的公文,历史训练数据是之前的公文。The following uses official document data as an example to explain in detail the process of the multi-class model self-learning online update method provided in this application. Notice, take the classification label: Social Development Department (Office Name), where the labeled data and historical training data are in the above display form, but the new labeled data is the latest official document, and the historical training data is the previous official document.
各类样本占比,是指分属于各类的样本数不同,比如:The proportion of samples of various types refers to the different numbers of samples belonging to each type, such as:
处室1  公文数占所有公文数占比  20%Office 1 The number of official documents accounted for 20% of all official documents
处室2  公文数占比              7%Office 2 Proportion of the number of official documents 7%
处室3  公文数占比              0.2%Office 3 Proportion of official documents 0.2%
如果某一段时间某些特定公文事件数较多,使得原样本占比分布发生变化,变成如下:If there are a large number of specific official document events in a certain period of time, the distribution of the original sample's proportion will change, becoming as follows:
处室1  公文数占所有公文数占比  9%Office 1 The number of official documents accounted for 9% of all official documents
处室2  公文数占比              10%Office 2 Proportion of the number of official documents 10%
处室3  公文数占比              3%Office 3 Proportion of official documents 3%
处室1的占比之前由占据主导的20%变成了9%,使得样本分布发生了较大变化。这里的较大变化可定义为某一处室的占比变化率超过threshold = 50%,(20-9/20 > 50%)The proportion of Office 1 changed from the dominant 20% to 9%, which caused a big change in the sample distribution. The big change here can be defined as the rate of change in the proportion of a certain room exceeds threshold = 50%, (20-9/20> 50%)
则根据上述模型更新过程对相应的多分类模型进行更新,例如,在公文分类场景中,模型上线后周预测精度保持在78%左右,疫情发生时公文内容发生改变,不更新模型时,模型精度下降到70%。采用专利提出的模型自动更新机制后,可将模型精度维持在78%。The corresponding multi-classification model is updated according to the above model update process. For example, in the official document classification scenario, the weekly prediction accuracy remains at about 78% after the model is online. When the epidemic occurs, the content of the official document changes. When the model is not updated, the model accuracy Down to 70%. After adopting the automatic model update mechanism proposed by the patent, the accuracy of the model can be maintained at 78%.
通过上述技术方案的表述可知,本申请提供的多分类模型自学习在线更新方法,过自行设计一套多分类模型更新的触发机制、训练数据的更新机制以及模型更新方法,既能够实现多分类模型的线上自动更新,又能够确保多分类模型的预测精度一直保持在较高状态。此外,通过模型精度跟踪的方式自动反映线上的多分类模型的预测表现,并通过设定相应的模型更新触发机制,为模型性能下降提供判断标准能够有效的找准多分类模型的更新时机,防止预测模型下降。此外,通过线上数据分布的检验和训练数据的样本比例的调整,能够使更新的模型具有自适应能力,能够适应线上运行数据分布的变化;而模型更新条件的设置则使得模型的预测精度稳定在一定范围,从而在确保模型预测精度稳定的前提下,显著提高模型的自适应能力。另外,本申请提供的多分类模型自学习在线更新方案还能够有效地避免人工更新模型的复杂工作,且能够实时响应,保障预测模型性能。It can be seen from the expression of the above technical solution that the self-learning online update method for multi-classification models provided by this application can realize multi-classification models by self-designing a set of multi-classification model update trigger mechanism, training data update mechanism and model update method. The online automatic update can ensure that the prediction accuracy of the multi-classification model has been maintained at a high level. In addition, the prediction performance of the online multi-class model is automatically reflected through model accuracy tracking, and the corresponding model update trigger mechanism is set to provide a criterion for model performance degradation, which can effectively find the update time of the multi-class model. Prevent the prediction model from falling. In addition, by checking the online data distribution and adjusting the sample ratio of the training data, the updated model can be adapted to adapt to changes in the online running data distribution; and the setting of the model update conditions makes the model's prediction accuracy Stable in a certain range, so as to significantly improve the adaptive ability of the model under the premise of ensuring the stability of the model's prediction accuracy. In addition, the multi-class model self-learning online update solution provided by the present application can also effectively avoid the complicated work of manually updating the model, and can respond in real time to ensure the performance of the prediction model.
应理解,上述实施例中各步骤的序号的大小并不意味着执行顺序的先后,各过程的执行顺序应以其功能和内在逻辑确定,而不应对本申请实施例的实施过程构成任何限定。It should be understood that the size of the sequence number of each step in the foregoing embodiment does not mean the order of execution. The execution sequence of each process should be determined by its function and internal logic, and should not constitute any limitation on the implementation process of the embodiment of the present application.
实施例2Example 2
与上述方法相对应,本申请还提供一种多分类模型自学习在线更新系统,该系统包括:Corresponding to the above method, this application also provides a multi-classification model self-learning online update system, which includes:
性能监测单元,用于根据预设统计周期对待更新模型的预测性能进行监测统计,并将各统计周期内的预测性能统计结果存入统计数据库;The performance monitoring unit is used to monitor and count the predicted performance of the model to be updated according to the preset statistical period, and store the statistical results of the predicted performance in each statistical period into the statistical database;
机制触发单元,用于使用预设的触发机制对所述统计数据库内的数据进行检查,以判断所述待更新模型是否需要进行在线更新;The mechanism trigger unit is configured to use a preset trigger mechanism to check the data in the statistical database to determine whether the model to be updated needs to be updated online;
数据更新单元,用于若所述待更新模型需要进行在线更新,则获取线上新产生数据,并根据所述新产生数据对待更新模型的训练数据进行更新;The data update unit is configured to, if the model to be updated needs to be updated online, obtain newly generated data online, and update the training data of the model to be updated according to the newly generated data;
模型更新单元,用于使用更新后训练数据对所述待更新模型进行更新训练,以获得更新后的多分类模型。The model update unit is used to update and train the model to be updated by using the updated training data to obtain an updated multi-classification model.
实施例3Example 3
本申请还提供一种电子装置70。参照图2所示,该图为本申请提供的电子装置70的较佳实施例结构示意图。The application also provides an electronic device 70. Referring to FIG. 2, this figure is a schematic structural diagram of a preferred embodiment of the electronic device 70 provided by this application.
在本实施例中,电子装置70可以是服务器、智能手机、平板电脑、便携计算机、桌上型计算机等具有运算功能的终端设备。In this embodiment, the electronic device 70 may be a terminal device with a computing function, such as a server, a smart phone, a tablet computer, a portable computer, a desktop computer, and the like.
该电子装置70包括:处理器71以及存储器72。The electronic device 70 includes a processor 71 and a memory 72.
存储器72包括至少一种类型的可读存储介质。至少一种类型的可读存储介质可为如闪存、硬盘、多媒体卡、卡型存储器等的非易失性存储介质。在一些实施例中,可读存储介质可以是该电子装置70的内部存储单元,例如该电子装置70的硬盘。在另一些实施例中,可读存储介质也可以是电子装置1的外部存储器,例如电子装置70上配备的插接式硬盘,智能存储卡(Smart Media Card, SMC),安全数字(Secure Digital, SD)卡,闪存卡(Flash Card)等。The memory 72 includes at least one type of readable storage medium. The at least one type of readable storage medium may be a non-volatile storage medium such as flash memory, hard disk, multimedia card, card-type memory, and the like. In some embodiments, the readable storage medium may be an internal storage unit of the electronic device 70, such as a hard disk of the electronic device 70. In other embodiments, the readable storage medium may also be an external memory of the electronic device 1, such as a plug-in hard disk equipped on the electronic device 70, a smart memory card (Smart Media Card, SMC), or a secure digital (Secure Digital). Digital, SD) card, flash card, etc.
在本实施例中,存储器72的可读存储介质通常用于存储安装于电子装置70的多分类模型自学习在线更新程序73。存储器72还可以用于暂时地存储已经输出或者将要输出的数据。In this embodiment, the readable storage medium of the memory 72 is generally used to store the multi-class model self-learning online update program 73 installed in the electronic device 70. The memory 72 can also be used to temporarily store data that has been output or will be output.
处理器72在一些实施例中可以是一中央处理器(Central Processing Unit, CPU),微处理器或其他数据处理芯片,用于运行存储器72中存储的程序代码或处理数据,例如多分类模型自学习在线更新程序73等。The processor 72 may be a central processing unit (Central Processing Unit, CPU), microprocessor or other data processing chip in some embodiments, for running the program code or processing data stored in the memory 72, for example, the multi-classification model self Learn the online update program 73 and more.
在一些实施例中,电子装置70为智能手机、平板电脑、便携计算机等的终端设备。在其他实施例中,电子装置70可以为服务器。In some embodiments, the electronic device 70 is a terminal device such as a smart phone, a tablet computer, and a portable computer. In other embodiments, the electronic device 70 may be a server.
图2仅示出了具有组件71-73的电子装置70,但是应理解的是,并不要求实施所有示出的组件,可以替代的实施更多或者更少的组件。FIG. 2 only shows the electronic device 70 with the components 71-73, but it should be understood that it is not required to implement all the illustrated components, and more or fewer components may be implemented instead.
可选地,该电子装置70还可以包括用户接口,用户接口可以包括输入单元比如键盘(Keyboard)、语音输入装置比如麦克风(microphone)等具有语音识别功能的设备、语音输出装置比如音响、耳机等,可选地用户接口还可以包括标准的有线接口、无线接口。Optionally, the electronic device 70 may also include a user interface. The user interface may include an input unit such as a keyboard (Keyboard), a voice input device such as a microphone (microphone) and other devices with voice recognition functions, and a voice output device such as audio, earphones, etc. Optionally, the user interface may also include a standard wired interface and a wireless interface.
可选地,该电子装置70还可以包括显示器,显示器也可以称为显示屏或显示单元。在一些实施例中可以是LED显示器、液晶显示器、触控式液晶显示器以及有机发光二极管(Organic Light-Emitting Diode,OLED)触摸器等。显示器用于显示在电子装置70中处理的信息以及用于显示可视化的用户界面。Optionally, the electronic device 70 may further include a display, and the display may also be referred to as a display screen or a display unit. In some embodiments, it may be an LED display, a liquid crystal display, a touch-sensitive liquid crystal display, and an organic light-emitting diode (Organic Light Emitting Diode). Light-Emitting Diode, OLED) touch device, etc. The display is used for displaying information processed in the electronic device 70 and for displaying a visualized user interface.
可选地,该电子装置70还可以包括触摸传感器。触摸传感器所提供的供用户进行触摸操作的区域称为触控区域。此外,这里的触摸传感器可以为电阻式触摸传感器、电容式触摸传感器等。而且,触摸传感器不仅包括接触式的触摸传感器,也可包括接近式的触摸传感器等。此外,触摸传感器可以为单个传感器,也可以为例如阵列布置的多个传感器。Optionally, the electronic device 70 may also include a touch sensor. The area provided by the touch sensor for the user to perform touch operations is called the touch area. In addition, the touch sensor here may be a resistive touch sensor, a capacitive touch sensor, or the like. Moreover, the touch sensor includes not only a contact type touch sensor, but also a proximity type touch sensor and the like. In addition, the touch sensor may be a single sensor, or may be, for example, a plurality of sensors arranged in an array.
此外,该电子装置70的显示器的面积可以与触摸传感器的面积相同,也可以不同。可选地,将显示器与触摸传感器层叠设置,以形成触摸显示屏。该装置基于触摸显示屏侦测用户触发的触控操作。In addition, the area of the display of the electronic device 70 may be the same as or different from the area of the touch sensor. Optionally, the display and the touch sensor are stacked to form a touch display screen. The device detects the touch operation triggered by the user based on the touch screen.
可选地,该电子装置70还可以包括射频(Radio Frequency,RF)电路,传感器、音频电路等等,在此不再赘述。Optionally, the electronic device 70 may also include a radio frequency (RF) circuit, a sensor, an audio circuit, etc., which will not be repeated here.
在图2所示的装置实施例中,作为一种计算机存储介质的存储器72中可以包括操作系统、以及多分类模型自学习在线更新程序73;处理器71执行存储器72中存储多分类模型自学习在线更新程序73时实现如下步骤:In the device embodiment shown in FIG. 2, the memory 72 as a computer storage medium may include an operating system, and a multi-class model self-learning online update program 73; the processor 71 executes the multi-class model self-learning stored in the memory 72 The following steps are implemented when updating program 73 online:
根据预设统计周期对待更新模型的预测性能进行监测统计,并将各统计周期内的预测性能统计结果存入统计数据库;According to the preset statistical period, the prediction performance of the model to be updated is monitored and statistics, and the statistical results of the prediction performance in each statistical period are stored in the statistical database;
使用预设的触发机制对所述统计数据库内的数据进行检查,以判断所述待更新模型是否需要进行在线更新;Use a preset trigger mechanism to check the data in the statistical database to determine whether the model to be updated needs to be updated online;
若所述待更新模型需要进行在线更新,则获取线上新产生数据,并根据所述新产生数据对待更新模型的训练数据进行更新;If the model to be updated needs to be updated online, acquiring newly generated data online, and updating the training data of the model to be updated according to the newly generated data;
使用更新后训练数据对所述待更新模型进行更新训练,以获得更新后的多分类模型。The updated training data is used to update and train the model to be updated to obtain an updated multi-classification model.
在该实施例中,图3为根据本申请实施例的多分类模型自学习在线更新程序的内部逻辑示意图,如图3所示,多分类模型自学习在线更新程序73还可以被分割为一个或者多个模块,一个或者多个模块被存储于存储器72中,并由处理器71执行,以完成本申请。本申请所称的模块是指能够完成特定功能的一系列计算机程序指令段。参照图3所示,为图2中多分类模型自学习在线更新程序73较佳实施例的程序模块图。多分类模型自学习在线更新程序73可以被分割为:性能监测模块74、机制触发模块75、数据更新模块76以及模型更新模块77。模块74-77所实现的功能或操作步骤均与上文类似,此处不再详述,示例性地,例如,其中:In this embodiment, FIG. 3 is a schematic diagram of the internal logic of the multi-class model self-learning online update program according to an embodiment of the present application. As shown in FIG. 3, the multi-class model self-learning online update program 73 can also be divided into one or Multiple modules, one or more modules are stored in the memory 72 and executed by the processor 71 to complete the application. The module referred to in this application refers to a series of computer program instruction segments that can complete specific functions. Referring to FIG. 3, it is a program module diagram of a preferred embodiment of the multi-class model self-learning online update program 73 in FIG. 2. The multi-class model self-learning online update program 73 can be divided into: a performance monitoring module 74, a mechanism triggering module 75, a data update module 76, and a model update module 77. The functions or operation steps implemented by modules 74-77 are similar to the above, and will not be described in detail here. Illustratively, for example, where:
性能监测模块74,用于根据预设统计周期对待更新模型的预测性能进行监测统计,并将各统计周期内的预测性能统计结果存入统计数据库;The performance monitoring module 74 is used to monitor and count the predicted performance of the model to be updated according to the preset statistical period, and store the statistical results of the predicted performance in each statistical period into the statistical database;
机制触发模块75,用于使用预设的触发机制对所述统计数据库内的数据进行检查,以判断所述待更新模型是否需要进行在线更新;The mechanism trigger module 75 is configured to use a preset trigger mechanism to check the data in the statistical database to determine whether the model to be updated needs to be updated online;
数据更新模块76,用于若所述待更新模型需要进行在线更新,则获取线上新产生数据,并根据所述新产生数据对待更新模型的训练数据进行更新;The data update module 76 is configured to, if the model to be updated needs to be updated online, obtain newly generated data online, and update the training data of the model to be updated according to the newly generated data;
模型更新模块77,用于使用更新后训练数据对所述待更新模型进行更新训练,以获得更新后的多分类模型。The model update module 77 is configured to update and train the model to be updated by using the updated training data to obtain an updated multi-classification model.
实施例4Example 4
本申请还提供一种计算机可读存储介质,所述计算机可读存储介质可以是非易失性,也可以是易失性,计算机可读存储介质中存储有多分类模型自学习在线更新程序73,多分类模型自学习在线更新程序73被处理器执行时实现如下操作:This application also provides a computer-readable storage medium. The computer-readable storage medium may be non-volatile or volatile. The computer-readable storage medium stores a multi-classification model self-learning online update program 73, When the multi-class model self-learning online update program 73 is executed by the processor, the following operations are implemented:
根据预设统计周期对待更新模型的预测性能进行监测统计,并将各统计周期内的预测性能统计结果存入统计数据库;According to the preset statistical period, the prediction performance of the model to be updated is monitored and statistics, and the statistical results of the prediction performance in each statistical period are stored in the statistical database;
使用预设的触发机制对所述统计数据库内的数据进行检查,以判断所述待更新模型是否需要进行在线更新;Use a preset trigger mechanism to check the data in the statistical database to determine whether the model to be updated needs to be updated online;
若所述待更新模型需要进行在线更新,则获取线上新产生数据,并根据所述新产生数据对待更新模型的训练数据进行更新;If the model to be updated needs to be updated online, acquiring newly generated data online, and updating the training data of the model to be updated according to the newly generated data;
使用更新后训练数据对所述待更新模型进行更新训练,以获得更新后的多分类模型。The updated training data is used to update and train the model to be updated to obtain an updated multi-classification model.
本申请提供的计算机可读存储介质的具体实施方式与上述多分类模型自学习在线更新方法、电子装置的具体实施方式大致相同,在此不再赘述。The specific implementation of the computer-readable storage medium provided in this application is substantially the same as the specific implementation of the above-mentioned multi-class model self-learning online update method and electronic device, and will not be repeated here.
需要说明的是,本申请所指区块链是分布式数据存储、点对点传输、共识机制、加密算法等计算机技术的新型应用模式。区块链(Blockchain),本质上是一个去中心化的数据库,是一串使用密码学方法相关联产生的数据块,每一个数据块中包含了一批次网络交易的信息,用于验证其信息的有效性(防伪)和生成下一个区块。区块链可以包括区块链底层平台、平台产品服务层以及应用服务层等。It should be noted that the blockchain referred to in this application is a new application mode of computer technology such as distributed data storage, point-to-point transmission, consensus mechanism, and encryption algorithm. Blockchain, essentially a decentralized database, is a series of data blocks associated with cryptographic methods. Each data block contains a batch of network transaction information for verification. The validity of the information (anti-counterfeiting) and the generation of the next block. The blockchain can include the underlying platform of the blockchain, the platform product service layer, and the application service layer.
需要进一步说明的是,在本文中,术语“包括”、“包含”或者其任何其他变体意在涵盖非排他性的包含,从而使得包括一系列要素的过程、装置、物品或者方法不仅包括那些要素,而且还包括没有明确列出的其他要素,或者是还包括为这种过程、装置、物品或者方法所固有的要素。在没有更多限制的情况下,由语句“包括一个……”限定的要素,并不排除在包括该要素的过程、装置、物品或者方法中还存在另外的相同要素。It should be further clarified that in this article, the terms "include", "include" or any other variants thereof are intended to cover non-exclusive inclusion, so that a process, device, article or method including a series of elements not only includes those elements , But also includes other elements that are not explicitly listed, or elements inherent to the process, device, article, or method. If there are no more restrictions, the element defined by the sentence "including a..." does not exclude the existence of other identical elements in the process, device, article, or method that includes the element.
上述本申请实施例序号仅仅为了描述,不代表实施例的优劣。通过以上的实施方式的描述,本领域的技术人员可以清楚地了解到上述实施例方法可借助软件加必需的通用硬件平台的方式来实现,当然也可以通过硬件,但很多情况下前者是更佳的实施方式。基于这样的理解,本申请的技术方案本质上或者说对现有技术做出贡献的部分可以以软件产品的形式体现出来,该计算机软件产品存储在如上的一个存储介质(如ROM/RAM、磁碟、光盘)中,包括若干指令用以使得一台终端设备(可以是手机,计算机,服务器,或者网络设备等)执行本申请各个实施例的方法。The serial numbers of the foregoing embodiments of the present application are for description only, and do not represent the superiority or inferiority of the embodiments. Through the description of the above implementation manners, those skilled in the art can clearly understand that the above-mentioned embodiment method can be implemented by means of software plus the necessary general hardware platform, of course, it can also be implemented by hardware, but in many cases the former is better.的实施方式。 Based on this understanding, the technical solution of this application essentially or the part that contributes to the existing technology can be embodied in the form of a software product, and the computer software product is stored in a storage medium such as ROM/RAM, magnetic A disc or an optical disc) includes a number of instructions to enable a terminal device (which can be a mobile phone, a computer, a server, or a network device, etc.) to execute the methods of the various embodiments of the present application.
以上仅为本申请的优选实施例,并非因此限制本申请的专利范围,凡是利用本申请说明书及附图内容所作的等效结构或等效流程变换,或直接或间接运用在其他相关的技术领域,均同理包括在本申请的专利保护范围内。The above are only the preferred embodiments of the application, and do not limit the scope of the patent for this application. Any equivalent structure or equivalent process transformation made using the content of the description and drawings of the application, or directly or indirectly applied to other related technical fields , The same reason is included in the scope of patent protection of this application.

Claims (20)

  1. 一种多分类模型自学习在线更新方法,应用于电子装置,其中,所述方法包括:A self-learning online update method for multiple classification models, applied to an electronic device, wherein the method includes:
    根据预设统计周期对待更新模型的预测性能进行监测统计,并将各统计周期内的预测性能统计结果存入统计数据库;According to the preset statistical period, the prediction performance of the model to be updated is monitored and statistics, and the statistical results of the prediction performance in each statistical period are stored in the statistical database;
    使用预设的触发机制对所述统计数据库内的数据进行检查,以判断所述待更新模型是否需要进行在线更新;Use a preset trigger mechanism to check the data in the statistical database to determine whether the model to be updated needs to be updated online;
    若所述待更新模型需要进行在线更新,则获取线上新产生数据,并根据所述新产生数据对待更新模型的训练数据进行更新;If the model to be updated needs to be updated online, acquiring newly generated data online, and updating the training data of the model to be updated according to the newly generated data;
    使用更新后训练数据对所述待更新模型进行更新训练,以获得更新后的多分类模型。The updated training data is used to update and train the model to be updated to obtain an updated multi-classification model.
  2. 根据权利要求1所述的多分类模型自学习在线更新方法,其中,所述统计数据库存储在区块链的节点中,并且所述预测性能包括预测精度precision值,所述预测精度precision值的计算公式为:The multi-class model self-learning online update method according to claim 1, wherein the statistical database is stored in a node of the blockchain, and the prediction performance includes a prediction precision value, and the calculation of the prediction precision precision value The formula is:
    预测精度Precision 值=正确分类的样本数/整体的样本数;并且,Prediction accuracy Precision value = the number of samples correctly classified/the number of samples as a whole; and,
    所述触发机制包括:The trigger mechanism includes:
    机制A:若包含当前统计周期在内的历史N个统计周期的预测精度precision值持续下降,则判定所述待更新模型需要进行在线更新,其中,N为第一预设参数。Mechanism A: If the prediction accuracy value of the historical N statistical periods including the current statistical period continues to decrease, it is determined that the model to be updated needs to be updated online, where N is the first preset parameter.
  3. 根据权利要求2所述的多分类模型自学习在线更新方法,其中,所述触发机制还包括:The method for self-learning online updating of multiple classification models according to claim 2, wherein the trigger mechanism further comprises:
    机制B:若当前统计周期的预测精度Precision 值小于预测精度平均值-2*预测精度标准差,或当前统计周期的预测精度Precision 值小于预测精度平均值-下降百分比P;则判定所述待更新模型需要进行在线更新;其中,Mechanism B: If the precision value of the prediction accuracy of the current statistical period is less than the average prediction accuracy-2*standard deviation of the prediction accuracy, or the precision value of the prediction accuracy of the current statistical period is less than the average prediction accuracy-decrease percentage P; then it is determined that the update is pending The model needs to be updated online; among them,
    所述预测精度平均值为历史N个统计周期的预测精度Precision 值的平均值,所述预测精度标准差为历史N个统计周期的预测精度Precision 值的标准差,所述下降百分比P为第二预测参数。The prediction accuracy average value is the average value of the prediction accuracy Precision values of the historical N statistical periods, the prediction accuracy standard deviation is the standard deviation of the prediction accuracy Precision values of the historical N statistical periods, and the decrease percentage P is the second Forecast parameters.
  4. 根据权利要求3所述的多分类模型自学习在线更新方法,其中,所述触发机制还包括:The method for self-learning online updating of multiple classification models according to claim 3, wherein the trigger mechanism further comprises:
    机制C:判定所述待更新模型的上线时长是否达到预设的更新周期阈值,若达到,则判定所述待更新模型需要进行在线更新;其中,Mechanism C: Determine whether the online duration of the model to be updated reaches the preset update cycle threshold, and if it reaches, then determine that the model to be updated needs to be updated online; wherein,
    所述更新周期阈值为所述统计周期的M倍;其中,M为自然数,且≥2。The update period threshold is M times the statistical period; where M is a natural number and ≥2.
  5. 根据权利要求4所述的多分类模型自学习在线更新方法,其中,若所述触发机制为所述机制A或所述机制B,则所述根据新产生数据对待更新模型的训练数据进行更新包括:The multi-class model self-learning online update method according to claim 4, wherein if the trigger mechanism is the mechanism A or the mechanism B, the updating the training data of the model to be updated according to the newly generated data comprises :
    增量式更新:Incremental update:
    检验所述新产生数据与所述待更新模型的历史训练数据的分布是否一致;Checking whether the distribution of the newly generated data is consistent with the historical training data of the model to be updated;
    若一致,则将所述新产生数据与所述历史训练数据进行合并,生成训练更新数据;其中,If they are consistent, merge the newly generated data with the historical training data to generate training update data; wherein,
    若所述新产生数据与所述历史训练数据的各类样本的占比差均小于预设占比阈值,则判定所述新产生数据与所述历史训练数据的分布一致。If the proportion difference between the various samples of the newly generated data and the historical training data is less than the preset proportion threshold, it is determined that the distribution of the newly generated data is consistent with the historical training data.
  6. 根据权利要求5所述的多分类模型自学习在线更新方法,其中,The method for self-learning online updating of multiple classification models according to claim 5, wherein:
    若所述新产生数据与所述历史训练数据的分布不一致,则通过下采样和过采样的方式对所述历史训练数据的各类样本进行循环式处理,直至所述新产生数据与所述历史训练数据的各类样本的占比差均小于预设占比阈值;If the distribution of the newly generated data is inconsistent with the historical training data, the various samples of the historical training data are processed in a circular manner through down-sampling and over-sampling until the newly generated data is consistent with the historical training data. The proportion difference of various samples of the training data is less than the preset proportion threshold;
    将所述新产生数据与循环处理后的历史训练数据进行合并,生成所述训练更新数据。Combining the newly generated data with the historical training data after cyclic processing to generate the training update data.
  7. 根据权利要求6所述的多分类模型自学习在线更新方法,其中,若所述触发机制为所述机制C,则所述根据新产生数据对待更新模型的训练数据进行更新包括:The multi-class model self-learning online update method according to claim 6, wherein if the trigger mechanism is the mechanism C, the updating the training data of the model to be updated according to the newly generated data comprises:
    滚动式更新:Rolling update:
    设定滚动固定时长,并根据所述滚动固定时长自所述新产生数据与所述历史训练数据中选取相应的数据的作为所述训练更新数据;其中,Set a fixed rolling duration, and select corresponding data from the newly generated data and the historical training data according to the fixed rolling duration as the training update data; wherein,
    所述滚动固定时长为所述统计周期的L倍;其中,L为自然数,且L>M。The rolling fixed duration is L times the statistical period; where L is a natural number, and L>M.
  8. 一种多分类模型自学习在线更新系统,其中,所述系统包括:A self-learning online update system for multiple classification models, wherein the system includes:
    性能监测单元,用于根据预设统计周期对待更新模型的预测性能进行监测统计,并将各统计周期内的预测性能统计结果存入统计数据库;The performance monitoring unit is used to monitor and count the predicted performance of the model to be updated according to the preset statistical period, and store the statistical results of the predicted performance in each statistical period into the statistical database;
    机制触发单元,用于使用预设的触发机制对所述统计数据库内的数据进行检查,以判断所述待更新模型是否需要进行在线更新;The mechanism trigger unit is configured to use a preset trigger mechanism to check the data in the statistical database to determine whether the model to be updated needs to be updated online;
    数据更新单元,用于若所述待更新模型需要进行在线更新,则获取线上新产生数据,并根据所述新产生数据对待更新模型的训练数据进行更新;The data update unit is configured to, if the model to be updated needs to be updated online, obtain newly generated data online, and update the training data of the model to be updated according to the newly generated data;
    模型更新单元,用于使用更新后训练数据对所述待更新模型进行更新训练,以获得更新后的多分类模型。The model update unit is used to update and train the model to be updated by using the updated training data to obtain an updated multi-classification model.
  9. 一种电子装置,其中,所述电子装置包括:存储器、处理器以及存储在所述存储器中并可在所述处理器上运行的多分类模型自学习在线更新程序,所述多分类模型自学习在线更新程序被所述处理器执行时实现多分类模型自学习在线更新方法;An electronic device, wherein the electronic device includes a memory, a processor, and a multi-class model self-learning online update program stored in the memory and running on the processor, and the multi-class model self-learning A method for realizing self-learning online updating of multi-classification models when the online updating program is executed by the processor;
    其中,所述多分类模型自学习在线更新方法的步骤包括:Wherein, the steps of the multi-classification model self-learning online update method include:
    根据预设统计周期对待更新模型的预测性能进行监测统计,并将各统计周期内的预测性能统计结果存入统计数据库;According to the preset statistical period, the prediction performance of the model to be updated is monitored and statistics, and the statistical results of the prediction performance in each statistical period are stored in the statistical database;
    使用预设的触发机制对所述统计数据库内的数据进行检查,以判断所述待更新模型是否需要进行在线更新;Use a preset trigger mechanism to check the data in the statistical database to determine whether the model to be updated needs to be updated online;
    若所述待更新模型需要进行在线更新,则获取线上新产生数据,并根据所述新产生数据对待更新模型的训练数据进行更新;If the model to be updated needs to be updated online, acquiring newly generated data online, and updating the training data of the model to be updated according to the newly generated data;
    使用更新后训练数据对所述待更新模型进行更新训练,以获得更新后的多分类模型。The updated training data is used to update and train the model to be updated to obtain an updated multi-classification model.
  10. 根据权利要求9所述的电子装置,其中,所述统计数据库存储在区块链的节点中,并且所述预测性能包括预测精度precision值,所述预测精度precision值的计算公式为:The electronic device according to claim 9, wherein the statistical database is stored in a node of the blockchain, and the prediction performance includes a prediction precision value, and the calculation formula of the prediction precision precision value is:
    预测精度Precision 值=正确分类的样本数/整体的样本数;并且,Prediction accuracy Precision value = the number of samples correctly classified/the number of samples as a whole; and,
    所述触发机制包括:The trigger mechanism includes:
    机制A:若包含当前统计周期在内的历史N个统计周期的预测精度precision值持续下降,则判定所述待更新模型需要进行在线更新,其中,N为第一预设参数。Mechanism A: If the prediction accuracy value of the historical N statistical periods including the current statistical period continues to decrease, it is determined that the model to be updated needs to be updated online, where N is the first preset parameter.
  11. 根据权利要求10所述的电子装置,其中,所述触发机制还包括:The electronic device according to claim 10, wherein the trigger mechanism further comprises:
    机制B:若当前统计周期的预测精度Precision 值小于预测精度平均值-2*预测精度标准差,或当前统计周期的预测精度Precision 值小于预测精度平均值-下降百分比P;则判定所述待更新模型需要进行在线更新;其中,Mechanism B: If the precision value of the prediction accuracy of the current statistical period is less than the average prediction accuracy-2*standard deviation of the prediction accuracy, or the precision value of the prediction accuracy of the current statistical period is less than the average prediction accuracy-decrease percentage P; then it is determined that the update is pending The model needs to be updated online; among them,
    所述预测精度平均值为历史N个统计周期的预测精度Precision 值的平均值,所述预测精度标准差为历史N个统计周期的预测精度Precision 值的标准差,所述下降百分比P为第二预测参数。The prediction accuracy average value is the average value of the prediction accuracy Precision values of the historical N statistical periods, the prediction accuracy standard deviation is the standard deviation of the prediction accuracy Precision values of the historical N statistical periods, and the decrease percentage P is the second Forecast parameters.
  12. 根据权利要求11所述的电子装置,其中,所述触发机制还包括:The electronic device according to claim 11, wherein the trigger mechanism further comprises:
    机制C:判定所述待更新模型的上线时长是否达到预设的更新周期阈值,若达到,则判定所述待更新模型需要进行在线更新;其中,Mechanism C: Determine whether the online duration of the model to be updated reaches the preset update cycle threshold, and if it reaches, then determine that the model to be updated needs to be updated online; wherein,
    所述更新周期阈值为所述统计周期的M倍;其中,M为自然数,且≥2。The update period threshold is M times the statistical period; where M is a natural number and ≥2.
  13. 根据权利要求12所述的电子装置,其中,若所述触发机制为所述机制A或所述机制B,则所述根据新产生数据对待更新模型的训练数据进行更新包括:The electronic device according to claim 12, wherein if the trigger mechanism is the mechanism A or the mechanism B, the updating the training data of the model to be updated according to the newly generated data comprises:
    增量式更新:Incremental update:
    检验所述新产生数据与所述待更新模型的历史训练数据的分布是否一致;Checking whether the distribution of the newly generated data is consistent with the historical training data of the model to be updated;
    若一致,则将所述新产生数据与所述历史训练数据进行合并,生成训练更新数据;其中,If they are consistent, merge the newly generated data with the historical training data to generate training update data; wherein,
    若所述新产生数据与所述历史训练数据的各类样本的占比差均小于预设占比阈值,则判定所述新产生数据与所述历史训练数据的分布一致。If the proportion difference between the various samples of the newly generated data and the historical training data is less than the preset proportion threshold, it is determined that the distribution of the newly generated data is consistent with the historical training data.
  14. 根据权利要求13所述的电子装置,其中,The electronic device according to claim 13, wherein:
    若所述新产生数据与所述历史训练数据的分布不一致,则通过下采样和过采样的方式对所述历史训练数据的各类样本进行循环式处理,直至所述新产生数据与所述历史训练数据的各类样本的占比差均小于预设占比阈值;If the distribution of the newly generated data is inconsistent with the historical training data, the various samples of the historical training data are processed in a circular manner through down-sampling and over-sampling until the newly generated data is consistent with the historical training data. The proportion difference of various samples of the training data is less than the preset proportion threshold;
    将所述新产生数据与循环处理后的历史训练数据进行合并,生成所述训练更新数据。Combining the newly generated data with the historical training data after cyclic processing to generate the training update data.
  15. 一种计算机可读存储介质,其中,所述计算机可读存储介质中存储有多分类模型自学习在线更新程序,所述多分类模型自学习在线更新程序被处理器执行时实现多分类模型自学习在线更新方法;A computer-readable storage medium, wherein a multi-class model self-learning online update program is stored in the computer-readable storage medium, and the multi-class model self-learning online update program is executed by a processor to realize the multi-class model self-learning Online update method;
    其中,所述多分类模型自学习在线更新方法的步骤包括:Wherein, the steps of the multi-classification model self-learning online update method include:
    根据预设统计周期对待更新模型的预测性能进行监测统计,并将各统计周期内的预测性能统计结果存入统计数据库;According to the preset statistical period, the prediction performance of the model to be updated is monitored and statistics, and the statistical results of the prediction performance in each statistical period are stored in the statistical database;
    使用预设的触发机制对所述统计数据库内的数据进行检查,以判断所述待更新模型是否需要进行在线更新;Use a preset trigger mechanism to check the data in the statistical database to determine whether the model to be updated needs to be updated online;
    若所述待更新模型需要进行在线更新,则获取线上新产生数据,并根据所述新产生数据对待更新模型的训练数据进行更新;If the model to be updated needs to be updated online, acquiring newly generated data online, and updating the training data of the model to be updated according to the newly generated data;
    使用更新后训练数据对所述待更新模型进行更新训练,以获得更新后的多分类模型。The updated training data is used to update and train the model to be updated to obtain an updated multi-classification model.
  16. 根据权利要求15所述的计算机可读存储介质,其中,所述统计数据库存储在区块链的节点中,并且所述预测性能包括预测精度precision值,所述预测精度precision值的计算公式为:The computer-readable storage medium according to claim 15, wherein the statistical database is stored in a node of the blockchain, and the prediction performance includes a prediction precision value, and the calculation formula of the prediction precision precision value is:
    预测精度Precision 值=正确分类的样本数/整体的样本数;并且,Prediction accuracy Precision value = the number of samples correctly classified/the number of samples as a whole; and,
    所述触发机制包括:The trigger mechanism includes:
    机制A:若包含当前统计周期在内的历史N个统计周期的预测精度precision值持续下降,则判定所述待更新模型需要进行在线更新,其中,N为第一预设参数。Mechanism A: If the prediction accuracy value of the historical N statistical periods including the current statistical period continues to decrease, it is determined that the model to be updated needs to be updated online, where N is the first preset parameter.
  17. 根据权利要求16所述的计算机可读存储介质,其中,所述触发机制还包括:The computer-readable storage medium of claim 16, wherein the trigger mechanism further comprises:
    机制B:若当前统计周期的预测精度Precision 值小于预测精度平均值-2*预测精度标准差,或当前统计周期的预测精度Precision 值小于预测精度平均值-下降百分比P;则判定所述待更新模型需要进行在线更新;其中,Mechanism B: If the precision value of the prediction accuracy of the current statistical period is less than the average prediction accuracy-2*standard deviation of the prediction accuracy, or the precision value of the prediction accuracy of the current statistical period is less than the average prediction accuracy-decrease percentage P; then it is determined that the update is pending The model needs to be updated online; among them,
    所述预测精度平均值为历史N个统计周期的预测精度Precision 值的平均值,所述预测精度标准差为历史N个统计周期的预测精度Precision 值的标准差,所述下降百分比P为第二预测参数。The prediction accuracy average value is the average value of the prediction accuracy Precision values of the historical N statistical periods, the prediction accuracy standard deviation is the standard deviation of the prediction accuracy Precision values of the historical N statistical periods, and the decrease percentage P is the second Forecast parameters.
  18. 根据权利要求17所述的计算机可读存储介质,其中,所述触发机制还包括:The computer-readable storage medium of claim 17, wherein the trigger mechanism further comprises:
    机制C:判定所述待更新模型的上线时长是否达到预设的更新周期阈值,若达到,则判定所述待更新模型需要进行在线更新;其中,Mechanism C: Determine whether the online duration of the model to be updated reaches the preset update cycle threshold, and if it reaches, then determine that the model to be updated needs to be updated online; wherein,
    所述更新周期阈值为所述统计周期的M倍;其中,M为自然数,且≥2。The update period threshold is M times the statistical period; where M is a natural number and ≥2.
  19. 根据权利要求18所述的计算机可读存储介质,其中,若所述触发机制为所述机制A或所述机制B,则所述根据新产生数据对待更新模型的训练数据进行更新包括:The computer-readable storage medium according to claim 18, wherein if the trigger mechanism is the mechanism A or the mechanism B, the updating the training data of the model to be updated according to the newly generated data comprises:
    增量式更新:Incremental update:
    检验所述新产生数据与所述待更新模型的历史训练数据的分布是否一致;Checking whether the distribution of the newly generated data is consistent with the historical training data of the model to be updated;
    若一致,则将所述新产生数据与所述历史训练数据进行合并,生成训练更新数据;其中,If they are consistent, merge the newly generated data with the historical training data to generate training update data; wherein,
    若所述新产生数据与所述历史训练数据的各类样本的占比差均小于预设占比阈值,则判定所述新产生数据与所述历史训练数据的分布一致。If the proportion difference between the various samples of the newly generated data and the historical training data is less than the preset proportion threshold, it is determined that the distribution of the newly generated data is consistent with the historical training data.
  20. 根据权利要求19所述的计算机可读存储介质,其中,The computer-readable storage medium according to claim 19, wherein:
    若所述新产生数据与所述历史训练数据的分布不一致,则通过下采样和过采样的方式对所述历史训练数据的各类样本进行循环式处理,直至所述新产生数据与所述历史训练数据的各类样本的占比差均小于预设占比阈值;If the distribution of the newly generated data is inconsistent with the historical training data, the various samples of the historical training data are processed in a circular manner through down-sampling and over-sampling until the newly generated data is consistent with the historical training data. The proportion difference of various samples of the training data is less than the preset proportion threshold;
    将所述新产生数据与循环处理后的历史训练数据进行合并,生成所述训练更新数据。Combining the newly generated data with the historical training data after cyclic processing to generate the training update data.
PCT/CN2020/124883 2020-09-04 2020-10-29 Self-learning online update method and system for multi-classification model, and apparatus WO2021159749A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202010922752.9A CN112036579B (en) 2020-09-04 2020-09-04 Multi-classification model self-learning online updating method, system and device
CN202010922752.9 2020-09-04

Publications (1)

Publication Number Publication Date
WO2021159749A1 true WO2021159749A1 (en) 2021-08-19

Family

ID=73590624

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/124883 WO2021159749A1 (en) 2020-09-04 2020-10-29 Self-learning online update method and system for multi-classification model, and apparatus

Country Status (2)

Country Link
CN (1) CN112036579B (en)
WO (1) WO2021159749A1 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113673166A (en) * 2021-08-26 2021-11-19 东华大学 Digital twin model working condition self-adaption method and system for machining quality prediction

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113361487B (en) * 2021-07-09 2024-09-06 无锡时代天使医疗器械科技有限公司 Foreign matter detection method, device, equipment and computer readable storage medium
CN114444717A (en) * 2022-01-25 2022-05-06 杭州海康威视数字技术股份有限公司 Autonomous learning method, device, electronic equipment and machine-readable storage medium
CN118193001A (en) * 2022-12-13 2024-06-14 展讯通信(上海)有限公司 Model updating method, device, equipment and storage medium

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108427698A (en) * 2017-08-29 2018-08-21 平安科技(深圳)有限公司 Updating device, method and the computer readable storage medium of prediction model
US20190012575A1 (en) * 2017-07-04 2019-01-10 Beijing Baidu Netcom Science And Technology Co., Ltd. Method, apparatus and system for updating deep learning model
CN109242135A (en) * 2018-07-16 2019-01-18 阿里巴巴集团控股有限公司 A kind of model method for running, device and service server
CN110288093A (en) * 2019-06-06 2019-09-27 博彦科技股份有限公司 Data processing method, device, storage medium and processor
CN111401563A (en) * 2018-12-28 2020-07-10 杭州海康威视数字技术股份有限公司 Machine learning model updating method and device
CN111428882A (en) * 2020-03-27 2020-07-17 联想(北京)有限公司 Processing method and computer equipment

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20190012575A1 (en) * 2017-07-04 2019-01-10 Beijing Baidu Netcom Science And Technology Co., Ltd. Method, apparatus and system for updating deep learning model
CN108427698A (en) * 2017-08-29 2018-08-21 平安科技(深圳)有限公司 Updating device, method and the computer readable storage medium of prediction model
CN109242135A (en) * 2018-07-16 2019-01-18 阿里巴巴集团控股有限公司 A kind of model method for running, device and service server
CN111401563A (en) * 2018-12-28 2020-07-10 杭州海康威视数字技术股份有限公司 Machine learning model updating method and device
CN110288093A (en) * 2019-06-06 2019-09-27 博彦科技股份有限公司 Data processing method, device, storage medium and processor
CN111428882A (en) * 2020-03-27 2020-07-17 联想(北京)有限公司 Processing method and computer equipment

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113673166A (en) * 2021-08-26 2021-11-19 东华大学 Digital twin model working condition self-adaption method and system for machining quality prediction
CN113673166B (en) * 2021-08-26 2023-10-31 东华大学 Digital twin model working condition self-adaption method and system for processing quality prediction

Also Published As

Publication number Publication date
CN112036579B (en) 2024-05-03
CN112036579A (en) 2020-12-04

Similar Documents

Publication Publication Date Title
WO2021159749A1 (en) Self-learning online update method and system for multi-classification model, and apparatus
US10205792B2 (en) Method and apparatus for processing page operation data
US10515315B2 (en) System and method for predicting and managing the risks in a supply chain network
CN110297912A (en) Cheat recognition methods, device, equipment and computer readable storage medium
CN112508118B (en) Target object behavior prediction method aiming at data offset and related equipment thereof
WO2019060010A1 (en) Content pattern based automatic document classification
WO2019062189A1 (en) Electronic device, method and system for conducting data table filing processing, and storage medium
CN109032825A (en) A kind of fault filling method, device and equipment
CN107147724A (en) A kind of information push method, server and computer-readable recording medium
CN109522923A (en) Customer address polymerization, device and computer readable storage medium
CN111258798A (en) Fault positioning method and device for monitoring data, computer equipment and storage medium
CN110162344A (en) A kind of method, apparatus, computer equipment and readable storage medium storing program for executing that current limliting is isolated
CN113434575A (en) Data attribution processing method and device based on data warehouse and storage medium
CN110807050B (en) Performance analysis method, device, computer equipment and storage medium
US9378230B1 (en) Ensuring availability of data in a set being uncorrelated over time
WO2021174779A1 (en) Data pre-processing system and method, computer device, and readable storage medium
CN115660073B (en) Intrusion detection method and system based on harmony whale optimization algorithm
WO2021120819A1 (en) Pop-up window processing method, apparatus, computer device, and storage medium
CN110443441B (en) Rule efficiency monitoring method, device, computer equipment and storage medium
CN116843395A (en) Alarm classification method, device, equipment and storage medium of service system
CN116701896A (en) Image tag determining method, image tag determining device, computer device, and storage medium
CN115022171B (en) Method and device for optimizing update interface, electronic equipment and readable storage medium
CN108989088A (en) A kind of log method for uploading and communication equipment
CN113703993A (en) Service message processing method, device and equipment
CN114238777A (en) Negative feedback flow distribution method, device, equipment and medium based on behavior analysis

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20919305

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 20919305

Country of ref document: EP

Kind code of ref document: A1