CN104951802A - Classifier updating method - Google Patents
Classifier updating method Download PDFInfo
- Publication number
- CN104951802A CN104951802A CN201510336424.XA CN201510336424A CN104951802A CN 104951802 A CN104951802 A CN 104951802A CN 201510336424 A CN201510336424 A CN 201510336424A CN 104951802 A CN104951802 A CN 104951802A
- Authority
- CN
- China
- Prior art keywords
- increment
- sample
- mistake
- sample set
- exceptional
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 25
- 238000012549 training Methods 0.000 claims abstract description 25
- 238000004422 calculation algorithm Methods 0.000 claims description 20
- 238000001514 detection method Methods 0.000 claims description 6
- 238000010801 machine learning Methods 0.000 claims description 6
- 238000007637 random forest analysis Methods 0.000 claims description 4
- 238000012706 support-vector machine Methods 0.000 claims description 4
- 230000015572 biosynthetic process Effects 0.000 claims description 2
- 230000002159 abnormal effect Effects 0.000 abstract description 3
- 230000009286 beneficial effect Effects 0.000 description 2
- 238000010586 diagram Methods 0.000 description 2
- 238000013450 outlier detection Methods 0.000 description 2
- 238000007796 conventional method Methods 0.000 description 1
- 238000012545 processing Methods 0.000 description 1
- 238000004064 recycling Methods 0.000 description 1
- 238000012546 transfer Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/241—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
- G06F18/2411—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on the proximity to a decision surface, e.g. support vector machines
Landscapes
- Engineering & Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Theoretical Computer Science (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Artificial Intelligence (AREA)
- Evolutionary Biology (AREA)
- Evolutionary Computation (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Image Analysis (AREA)
Abstract
The invention discloses a classifier updating method. The classifier updating method comprises the steps of firstly, collecting a wrongly-classified training sample and an incremental wrongly-classified sample; secondly, collecting all abnormal samples in an incremental wrongly-classified sample set by utilizing a basic wrongly-classified sample set; finally, updating a classifier by utilizing an abnormal sample set and the learning of an incremental machine. According to the classifier updating method disclosed by the invention, the incremental wrongly-classified sample set is screened by utilizing the basic wrongly-classified sample set, and thus the phenomenon that the generalization performance of a harmful image classifier is reduced as some helpful wrongly-classified training samples are used for updating can be avoided.
Description
Technical field
The present invention relates to Computer Applied Technology field, particularly a kind of sorter update method.
Background technology
The multimedias such as picture, video and audio frequency become gradually harmful (as pornographic, violence etc.) information propagate on the internet one of the major way taked.And in these network harmful informations, because picture transfer is relatively convenient, relatively easily browse, lower to hardware requirement, the actual harmfulness therefore brought to teenager may be maximum.The social concerns such as the negative influence that in network, harmful pictorial information causes and crime have been subject to people and have more and more paid close attention to.Imperfect picture information how in time in automatic recognition network, and then take effective Supervision Measures, become very urgent problem.
Network is harmful to the identification of image, is all generally first extract the dissimilar characteristics of image that can embody harmful semanteme, then constructs harmful Image Classifier according to these features.In addition, carry out in actual network image identification utilizing the harmful Image Classifier obtained, people trade union collects the sample divided by harmful Image Classifier mistake incessantly, then utilizes these increments mistake point samples and increment machine learning to upgrade harmful Image Classifier.
But utilize whole increment mistake point samples harmful Image Classifier to be upgraded to the Generalization Capability that likely can reduce harmful Image Classifier at present.Main cause is because in the training process of harmful Image Classifier, in order to ensure that sorter has good Generalization Capability, generally all makes the sorter finally obtained on training set, keep certain error rate.That is training set exist some reasonably wrong point samples.If increment mistake point sample and the mistake on training set are divided, sample is the same or closely, that illustrates that these increment mistake point samples should not be used for the renewal of harmful Image Classifier.So be necessary selecting an increment mistake point sample, to realize more reasonably carrying out sorter renewal.
Summary of the invention
(1) technical matters that will solve
The object of the present invention is to provide a kind of sorter update method, can avoid dividing training sample to be used for upgrading some useful mistakes and reducing the Generalization Capability of sorter.
(2) technical scheme
The invention provides a kind of sorter update method, it is characterized in that, comprising:
S1, collects wrong point training sample and increment mistake point sample, forms basic wrong point sample set and increment mistake point sample set respectively, and wherein, basic wrong point sample is used for training classifier, and increment mistake point sample is for upgrading sorter;
S2, utilizes basic wrong point sample set, collects all exceptional samples in increment mistake point sample set, forms exceptional sample collection;
S3, utilizes exceptional sample collection and increment machine learning to upgrade described sorter.
Further, method also comprises: S4, divides training sample set to merge described exceptional sample collection and described basic mistake, forms new basic mistake and divides training sample set.
Further, step S2 comprises: each increment mistake point sample in described increment mistake point sample set is put into described basic mistake in turn and divides in sample set, whether the increment mistake point sample put into described in detection is exceptional sample, collect all exceptional samples in described increment mistake point sample set, form exceptional sample collection.
Further, whether be the step of exceptional sample comprise: divide sample set to merge formation new samples collection with described basic mistake in an increment mistake point sample in described increment mistake point sample set temporarily, integrate operation exception detection algorithm judge that this increment mistake point sample is concentrated whether as exceptional sample at described new samples at described new samples if detecting increment mistake point sample.
Further, increment machine learning algorithm can be support vector machine Incremental Learning Algorithm or random forest Incremental Learning Algorithm.
Further, sorter can be harmful Image Classifier.
(3) beneficial effect
A kind of sorter update method provided by the invention, utilizes basic wrong point sample set, screens, can avoid dividing training sample to be used for upgrading some useful mistakes and reducing the Generalization Capability of harmful Image Classifier to an increment mistake point sample set.
Accompanying drawing explanation
Fig. 1 is the process flow diagram of harmful Image Classifier update method provided by the invention.
Embodiment
For making the object, technical solutions and advantages of the present invention clearly understand, below in conjunction with specific embodiment, and with reference to accompanying drawing, the present invention is described in more detail.
A kind of sorter update method provided by the invention, first wrong point training sample and increment mistake point sample is collected, the basic wrong point sample set of recycling, collects all exceptional samples in increment mistake point sample set, finally utilizes exceptional sample collection and increment machine learning to upgrade described sorter.The present invention, owing to utilizing basic wrong point sample set, screens an increment mistake point sample set, can avoid dividing training sample to be used for upgrading some useful mistakes and reducing the Generalization Capability of harmful Image Classifier.
Fig. 1 is the process flow diagram of harmful Image Classifier update method provided by the invention, and its step comprises:
S1, computing machine collects wrong point training sample and increment mistake point sample respectively, forms basic wrong point sample set and increment mistake point sample set respectively, and wherein, basic wrong point sample is for training described sorter, and increment mistake point sample is for upgrading described sorter; Assuming that a basic wrong point sample set contains 100 mistakes divide sample, also namely the most initial for training harmful Image Classifier time, the Image Classifier trained creates the result of wrong point on training set to 100 images, so in fact these 100 images are exactly that 100 mistakes that mistake point sample set comprises substantially divide sample.
S2, when the harmful Image Classifier the most initially trained being used for actual harmful image detection application, harmful Image Classifier can produce wrong point to some samples, in order to utilize increment mistake point sample set to promote the performance of harmful Image Classifier, each increment mistake point sample in increment mistake point sample set is put in basic wrong point sample set in turn, utilizes online Outlier Detection Algorithm whether to be abnormal to judge that this increment mistake point sample is concentrated at new samples; The algorithm etc. that online Outlier Detection Algorithm can comprise SmartSifter by conventional method, upgrade based on svd.Exceptional sample all after tested in increment mistake point sample set is picked out, forms exceptional sample collection.
S3, utilizes exceptional sample collection and Incremental Learning Algorithm to upgrade harmful Image Classifier; Incremental Learning Algorithm is according to deciding for learning algorithm when training harmful Image Classifier at first.If algorithm is originally support vector machine, so Incremental Learning Algorithm just selects support vector machine Incremental Learning Algorithm, if original algorithm is random forest, so Incremental Learning Algorithm is with regard to selectivity increment random forest learning algorithm.
S4, merges exceptional sample collection and a basic wrong point training sample set, forms new basic mistake and divide training sample set, so that harmful Image Classifier next time upgrades.
Execution environment of the present invention adopts one have the Pentium 4 computing machine of 3.0G hertz central processing unit and 2G byte of memory and worked out harmful Image Classifier update method constructor with C Plus Plus, achieve a kind of harmful Image Classifier update method newly of the present invention, other execution environment can also be adopted, do not repeat them here.Above-described specific embodiment; object of the present invention, technical scheme and beneficial effect are further described; be understood that; the foregoing is only specific embodiments of the invention; be not limited to the present invention; within the spirit and principles in the present invention all, any amendment made, equivalent replacement, improvement etc., all should be included within protection scope of the present invention.
Claims (6)
1. a sorter update method, is characterized in that, comprising:
S1, collects wrong point training sample and increment mistake point sample, and form basic wrong point sample set and increment mistake point sample set respectively, wherein, described basic mistake divides sample to be used for training described sorter, and described increment mistake point sample is for upgrading described sorter;
S2, utilizes described basic mistake to divide sample set, collects all exceptional samples in described increment mistake point sample set, forms exceptional sample collection;
S3, utilizes described exceptional sample collection and increment machine learning to upgrade described sorter.
2. method according to claim 1, is characterized in that, the method also comprises:
S4, divides training sample set to merge described exceptional sample collection and described basic mistake, forms new basic mistake and divides training sample set.
3. method according to claim 2, is characterized in that, described step S2 comprises:
Each increment mistake point sample in described increment mistake point sample set is put into described basic mistake in turn to be divided in sample set, whether the increment mistake point sample put into described in detection is exceptional sample, collect all exceptional samples in described increment mistake point sample set, form exceptional sample collection.
4. method according to claim 3, is characterized in that, whether described detection increment mistake point sample is that the step of exceptional sample comprises:
Divide sample set to merge formation new samples collection in an increment mistake point sample in described increment mistake point sample set and described basic mistake temporarily, integrate operation exception detection algorithm at described new samples and judge that whether this increment mistake point sample is concentrated as exceptional sample at described new samples.
5. according to the method described in claim 1-4 any one, it is characterized in that, described increment machine learning algorithm is support vector machine Incremental Learning Algorithm or random forest Incremental Learning Algorithm.
6. according to the method described in claim 1-4 any one, it is characterized in that, described sorter is harmful Image Classifier.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201510336424.XA CN104951802A (en) | 2015-06-17 | 2015-06-17 | Classifier updating method |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201510336424.XA CN104951802A (en) | 2015-06-17 | 2015-06-17 | Classifier updating method |
Publications (1)
Publication Number | Publication Date |
---|---|
CN104951802A true CN104951802A (en) | 2015-09-30 |
Family
ID=54166442
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201510336424.XA Pending CN104951802A (en) | 2015-06-17 | 2015-06-17 | Classifier updating method |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN104951802A (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109753580A (en) * | 2018-12-21 | 2019-05-14 | Oppo广东移动通信有限公司 | A kind of image classification method, device, storage medium and electronic equipment |
WO2019179189A1 (en) * | 2018-03-23 | 2019-09-26 | 北京达佳互联信息技术有限公司 | Image classification model optimization method and device and terminal |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101055621A (en) * | 2006-04-10 | 2007-10-17 | 中国科学院自动化研究所 | Content based sensitive web page identification method |
CN101315670A (en) * | 2007-06-01 | 2008-12-03 | 清华大学 | Specific shot body detection device, learning device and method thereof |
CN103593672A (en) * | 2013-05-27 | 2014-02-19 | 深圳市智美达科技有限公司 | Adaboost classifier on-line learning method and Adaboost classifier on-line learning system |
CN104391860A (en) * | 2014-10-22 | 2015-03-04 | 安一恒通(北京)科技有限公司 | Content type detection method and device |
-
2015
- 2015-06-17 CN CN201510336424.XA patent/CN104951802A/en active Pending
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101055621A (en) * | 2006-04-10 | 2007-10-17 | 中国科学院自动化研究所 | Content based sensitive web page identification method |
CN101315670A (en) * | 2007-06-01 | 2008-12-03 | 清华大学 | Specific shot body detection device, learning device and method thereof |
CN103593672A (en) * | 2013-05-27 | 2014-02-19 | 深圳市智美达科技有限公司 | Adaboost classifier on-line learning method and Adaboost classifier on-line learning system |
CN104391860A (en) * | 2014-10-22 | 2015-03-04 | 安一恒通(北京)科技有限公司 | Content type detection method and device |
Non-Patent Citations (3)
Title |
---|
WEIMING HU ETAL: "Recognition of Adult Images, Videos, and Web Page Bags", 《ACM TRANSACTIONS ON MULTIMEDIA COMPUTING, COMMUNICATIONS AND APPLICATIONS》 * |
丁昕苗 等: "基于多视角融合稀疏表示的恐怖视频识别", 《电子学报》 * |
李文昊 等: "一种改进的AdaBoost人脸检测算法", 《电视技术》 * |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2019179189A1 (en) * | 2018-03-23 | 2019-09-26 | 北京达佳互联信息技术有限公司 | Image classification model optimization method and device and terminal |
US11544496B2 (en) | 2018-03-23 | 2023-01-03 | Beijing Dajia Internet Information Technology Co., Ltd. | Method for optimizing image classification model, and terminal and storage medium thereof |
CN109753580A (en) * | 2018-12-21 | 2019-05-14 | Oppo广东移动通信有限公司 | A kind of image classification method, device, storage medium and electronic equipment |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109639481A (en) | A kind of net flow assorted method, system and electronic equipment based on deep learning | |
CN113632099A (en) | Distributed product defect analysis system, method and computer readable storage medium | |
CN112182098B (en) | Information push method and information push server based on cloud computing and big data | |
CN113382279B (en) | Live broadcast recommendation method, device, equipment, storage medium and computer program product | |
CN112668586B (en) | Model training method, picture processing device, storage medium, and program product | |
CN107729908B (en) | Method, device and system for establishing machine learning classification model | |
US10162879B2 (en) | Label filters for large scale multi-label classification | |
CN110780965B (en) | Vision-based process automation method, equipment and readable storage medium | |
US20240185096A1 (en) | A Method, Device and Storage Medium for Knowledge Recommendation | |
CN105630662B (en) | Internal-memory detection method and device | |
WO2019242442A1 (en) | Multi-model feature-based malware identification method, system and related apparatus | |
US20130322682A1 (en) | Profiling Activity Through Video Surveillance | |
CN105574030A (en) | Information search method and device | |
CN112187890B (en) | Information distribution method based on cloud computing and big data and block chain financial cloud center | |
CN112861894A (en) | Data stream classification method, device and system | |
CN111783812A (en) | Method and device for identifying forbidden images and computer readable storage medium | |
CN110019827B (en) | Corpus generation method, apparatus, device and computer storage medium | |
CN110532404A (en) | One provenance multimedia determines method, apparatus, equipment and storage medium | |
CN104951802A (en) | Classifier updating method | |
CN110457555A (en) | Collecting method, device and computer equipment, storage medium based on Docker | |
CN111739649B (en) | User portrait capturing method, device and system | |
CN109389972B (en) | Quality testing method and device for semantic cloud function, storage medium and equipment | |
CN112364185A (en) | Method and device for determining characteristics of multimedia resource, electronic equipment and storage medium | |
CN106293650A (en) | A kind of folder attribute method to set up and device | |
CN104572996A (en) | Processing method and device for video webpage |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
WD01 | Invention patent application deemed withdrawn after publication | ||
WD01 | Invention patent application deemed withdrawn after publication |
Application publication date: 20150930 |