CN110083728A - A kind of methods, devices and systems of optimization automation image data cleaning quality - Google Patents

A kind of methods, devices and systems of optimization automation image data cleaning quality Download PDF

Info

Publication number
CN110083728A
CN110083728A CN201910267802.1A CN201910267802A CN110083728A CN 110083728 A CN110083728 A CN 110083728A CN 201910267802 A CN201910267802 A CN 201910267802A CN 110083728 A CN110083728 A CN 110083728A
Authority
CN
China
Prior art keywords
picture
threshold value
cleaned
confidence
classifier
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201910267802.1A
Other languages
Chinese (zh)
Other versions
CN110083728B (en
Inventor
吴英平
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai re SR Information Technology Co.,Ltd.
Original Assignee
Shanghai Lianyin Electronic Technology Partnership (limited Partnership)
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Lianyin Electronic Technology Partnership (limited Partnership) filed Critical Shanghai Lianyin Electronic Technology Partnership (limited Partnership)
Priority to CN201910267802.1A priority Critical patent/CN110083728B/en
Publication of CN110083728A publication Critical patent/CN110083728A/en
Application granted granted Critical
Publication of CN110083728B publication Critical patent/CN110083728B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/50Information retrieval; Database structures therefor; File system structures therefor of still image data
    • G06F16/55Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Databases & Information Systems (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Image Analysis (AREA)
  • Image Processing (AREA)

Abstract

The invention discloses a kind of methods, devices and systems of optimization automation image data cleaning quality, comprising: pictures to be cleaned are sequentially input to two classifier of two classifier of coarseness and fine granularity, obtain the confidence level of the class prediction of image data to be cleaned;Confidence threshold value and the first picture number threshold value corresponding with confidence threshold value based on setting filter out the picture for needing manually to be cleaned;Confidence level based on the class prediction for needing manually to be cleaned picture and the feedback result manually cleaned, obtain the model accuracy of two classifier of fine granularity;Using the model accuracy of two classifier of fine granularity and model optimization frequency threshold value as optimal conditions, the model optimization of two classifier of fine granularity is carried out.The present invention can obtain very high image cleaning quality through too small amount of two sorter model iteration of fine granularity on the basis of legacy data cleaning method, in some cases even can the artificial cleaning of substitution completely after the completion of model iteration.

Description

A kind of methods, devices and systems of optimization automation image data cleaning quality
Technical field
The invention belongs to technical field of image processing, and in particular to a kind of side of optimization automation image data cleaning quality Method, device and system.
Background technique
The breakthrough obtained in field of image recognition with depth learning technology, neural network have become mainstream Field of image recognition apply algorithm.But neural network is a kind of supervised learning algorithm, so-called supervised learning refers to research and development Person uses given data collection, allows neural network to be learnt based on the data that output and input of label, to continue to optimize itself Model parameter, allow oneself constantly " clever ", that is, the image data with accurate label of magnanimity needed to train and could obtain To good recognition accuracy.It is higher for the accuracy rate of the data more multi-model of study for theoretically.But this optimal shape Condition, be built upon data for study it is all quite right in the case where, if wherein mixed wrong data, then learning the standard obtained True rate will obviously be affected.Therefore the bottleneck that the cleaning of mass picture data becomes restriction nerual network technique development is asked Topic.Industry master image data cleaning way to be used is still based on the traditional approach manually cleaned at present.
Number of patent application is 2018107215159, and patent name is a kind of middle promulgated by the State Council of method and device for cleaning data It is disclosed in bright patent application: during data cleansing, first choosing maximum probability in data to be cleaned and be determined as correct number According to the data with mistake, the data that centre has some comparisons to be difficult to confirm are screened again, then pick out positive sample and negative sample, Although the accuracy rate of data set can be improved, effect in practical engineering applications is not fine, and its cleaning quality is base This fixation, more good data cleansing quality can not be obtained.
Summary of the invention
In view of the above-mentioned problems, the present invention proposes that a kind of optimization automates the method, apparatus of image data cleaning quality and is System can obtain on the basis of legacy data cleaning method through too small amount of two sorter model iteration of fine granularity very high Image cleaning quality, in some cases even can the artificial cleaning of substitution completely after the completion of model iteration.
In order to achieve the above technical purposes, reach above-mentioned technical effect, the invention is realized by the following technical scheme:
In a first aspect, the present invention provides a kind of method of optimization automation image data cleaning quality, including following step It is rapid:
Pictures to be cleaned are obtained, and are input in preset two classifier of coarseness, are filtered out satisfactory First kind pictures;
The first kind pictures are input to preset two classifier of fine granularity, obtain the class of each picture to be cleaned The confidence level that do not predict;
Confidence threshold value and the first picture number threshold value corresponding with the confidence threshold value based on setting, screening The picture for needing manually to be cleaned out;
Confidence level based on the class prediction for needing manually to be cleaned picture and the feedback result manually cleaned, obtain particulate Spend the model accuracy of two classifiers;
Using the model accuracy of two classifier of fine granularity and model optimization frequency threshold value as optimal conditions, based on being needed The confidence level of the class prediction of artificial cleaning picture, the feedback result manually cleaned and samples pictures carry out fine granularity two and classify The model optimization of device;
It repeats the above process, until obtaining two classifier of fine granularity met the requirements, completes the cleaning of all pictures.
Preferably, the training process of two classifier of preset fine granularity are as follows:
One test pictures are provided, the test pictures are clustered according to cluster algorithm, obtain positive sample It tests pictures and negative sample tests pictures;
Pictures are tested according to the positive sample and negative sample tests pictures, and training obtains two classifier of fine granularity.
Preferably, the confidence threshold value based on setting and the first picture number corresponding with the confidence threshold value Threshold value is measured, the picture for needing manually to be cleaned is filtered out, specifically includes following sub-step:
By the confidence threshold value of setting compared with the confidence level of the class prediction of each picture to be cleaned of acquisition;
When the picture number that the confidence level of prediction is less than the confidence threshold value of the setting is greater than preset first picture number Threshold value is measured, then picture regards as the picture for needing manually to be cleaned by the part.
Preferably, during the model optimization of two classifier of fine granularity, the confidence threshold value based on setting and The first picture number threshold value corresponding with the confidence threshold value filters out after needing the picture step manually cleaned Further include:
Rule based on setting picks out the picture for needing manually to be cleaned from each forecast confidence distributed area.
Preferably, the confidence level based on the class prediction for needing manually to be cleaned picture and the feedback knot manually cleaned Fruit obtains the model accuracy of two classifier of fine granularity, specifically includes following sub-step:
For confidence threshold value and the first picture number threshold value corresponding with the confidence threshold value based on setting, Filter out the picture for needing manually to be cleaned, confidence level and people for each picture to be cleaned, when its class prediction Work cleans feedback result conflict, then determines classification error, conversely, then determining that classification is correct;
Need the picture that is manually cleaned for picking out from each forecast confidence distributed area, for each to The picture of cleaning then determines classification error, conversely, then when the confidence level of its class prediction conflicts with artificial cleaning feedback result Determine that classification is correct;
Based on above-mentioned classification judging result, the model accuracy of two classifier of fine granularity is calculated.
Preferably, the confidence threshold value based on setting and the first picture number corresponding with the confidence threshold value Threshold value is measured, is filtered out after needing the picture step manually cleaned further include:
In picture to be cleaned, the picture number that the confidence level of prediction is greater than the confidence threshold value of the setting is less than pre- If second picture amount threshold, then the confidence threshold value by the setting is turned down, setting after then being provided again higher than the adjusting The image data of confidence threshold, is manually cleaned.
Preferably, the model optimization of two classifier of fine granularity, specifically includes following sub-step:
Fallibility sample is obtained, includes fallibility positive sample and fallibility negative sample in the fallibility sample;
Accuracy threshold value is set when the model accuracy of two classifier of fine granularity is less than, and the optimization number of model is less than mould Type optimize frequency threshold value when, by the fallibility positive sample of acquisition, fallibility negative sample and other positive samples, negative sample together as Training set carrys out again two classifier of fine granularity, optimizes two classifier of fine granularity, at the same time, two classifier of fine granularity Model optimization number adds one.
Preferably, the confidence level for remembering the class prediction to manually clean picture is Confidencepredict, manually clean Feedback result is Confidencegroundtruth, the calculation formula of the fallibility sample are as follows:
|Confidencegroundtruth-Confidencepredict|>threshold
Wherein, ConfidencepredictValue range be (0,1), Confidencegroundtruth0 or 1 is taken, Threshold is preset threshold value.
Second aspect, the present invention provides a kind of devices of optimization automation image data cleaning quality, comprising:
First screening module for obtaining pictures to be cleaned, and is input in preset two classifier of coarseness, Filter out satisfactory first kind pictures;
First computing module obtains every for the first kind pictures to be input to preset two classifier of fine granularity The confidence level of the class prediction of one picture to be cleaned;
Second screening module, for the confidence threshold value and corresponding with the confidence threshold value first based on setting Picture number threshold value filters out the picture for needing manually to be cleaned;
Second computing module, for based on the class prediction for needing manually to be cleaned picture confidence level with manually clean Feedback result obtains the model accuracy of two classifier of fine granularity;
Optimization module, for using the model accuracy of two classifier of fine granularity and model optimization frequency threshold value as optimization item Part, confidence level, the manually feedback result cleaned and samples pictures based on the class prediction for needing manually to be cleaned picture into The model optimization of two classifier of row fine granularity.
The third aspect, the present invention provides a kind of systems of optimization automation image data cleaning quality, comprising:
Processor is adapted for carrying out each instruction;And
Equipment is stored, is suitable for storing a plurality of instruction, described instruction is suitable for being loaded by processor and executing institute in first aspect The step of stating.
Compared with prior art, beneficial effects of the present invention:
The method, apparatus and system of optimization automation image data cleaning quality of the invention, can be clear in legacy data Very high image cleaning quality is obtained through too small amount of two sorter model iteration of fine granularity on the basis of washing method, certain In the case of even can the artificial cleaning of substitution completely after the completion of model iteration.
Detailed description of the invention
Fig. 1 is the flow diagram of the method for the optimization automation image data cleaning quality of an embodiment of the present invention.
Specific embodiment
In order to make the objectives, technical solutions, and advantages of the present invention clearer, with reference to embodiments, to the present invention It is further elaborated.It should be appreciated that the specific embodiments described herein are merely illustrative of the present invention, it is not used to It limits the scope of protection of the present invention.
Application principle of the invention is explained in detail with reference to the accompanying drawing.
Neural network is allowed to be learnt based on the data that output and input of label in order to realize, to continue to optimize itself Model parameter allows oneself continuous " clever ", it is necessary to which the image data training with accurate label for providing magnanimity could obtain To good recognition accuracy.It is higher for the accuracy rate of the data more multi-model of study for theoretically.But this optimal shape Condition, be built upon data for study it is all quite right in the case where, if wherein mixed wrong data, then learning the standard obtained True rate will obviously be affected.Therefore the bottleneck that the cleaning of mass picture data becomes restriction nerual network technique development is asked Topic.Industry master image data cleaning way to be used is still based on the traditional approach manually cleaned at present, and artificial screening is not Only heavy workload, and the data obtained cause the similarity of data voluntarily judge by user and divided due to artificial subjectivity Class, cause neural network model may mistake data and influence the performance of neural network model.For this purpose, the present invention provides one The method, apparatus and system of kind optimization automation image data cleaning quality, can be on the basis of legacy data cleaning method Very high image cleaning quality is obtained through too small amount of two sorter model iteration of fine granularity, it in some cases even can be The artificial cleaning of substitution completely after the completion of model iteration.
Embodiment 1
As shown in Figure 1, the embodiment of the invention provides a kind of methods of optimization automation image data cleaning quality, including Following steps:
(1) pictures to be cleaned are obtained, and are input in preset two classifier of coarseness, filters out and meets the requirements First kind pictures;
(2) first kind pictures are input to preset two classifier of fine granularity, obtain each picture to be cleaned Class prediction confidence level;The step (1) and (2) and the automatic wash phase corresponded in Fig. 1;
(3) confidence threshold value based on setting and the first picture number threshold value corresponding with the confidence threshold value, Filter out the picture for needing manually to be cleaned;The step (3) corresponds to the artificial wash phase in Fig. 1;
(4) confidence level based on the class prediction for needing manually to be cleaned picture and the feedback result manually cleaned, obtain The model accuracy of two classifier of fine granularity;
(5) using the model accuracy of two classifier of fine granularity and model optimization frequency threshold value as optimal conditions, it is based on institute Confidence level, the feedback result manually cleaned and the samples pictures for needing manually to be cleaned the class prediction of picture carry out fine granularity two The model optimization of classifier;The step (5) corresponds to the model optimization stage in Fig. 1;
(6) it repeats the above process, until obtaining two classifier of fine granularity met the requirements, completes the cleaning of all pictures.
In a kind of specific embodiment of the embodiment of the present invention, the step (1) specifically:
Using web crawlers, initial testing pictures are obtained, according to preset two classifier of coarseness, to described initial Test pictures are trained, and obtain the test pictures.For great amount of images sample needed for acquisition training neural network model Notebook data, most convenient and fast mode are the methods obtained by web crawlers, and web crawlers can will meet according to the condition of setting The information of the condition is crawled out from the massive information of internet, but the pictorial information that web crawlers obtains is magnanimity, And many pictorial informations are unwanted.Assuming that obtaining the picture concerned data that classification is A class by web crawlers, knot is crawled Fruit often obtains the image data of big non-A class, therefore by two classifier of coarseness, the mass picture obtained to crawler network Data carry out preliminary classification, reject the image data of non-A class, obtain the image data of A class.For example, being obtained by web crawlers The related vegetable picture of Tomato omelette, often crawl not be Tomato omelette vegetable picture, pass through coarseness Two classifiers get the vegetable picture of Tomato omelette.According to the technical solution, by two classifier of coarseness, to sea The image data of amount does preliminary classification, and the training for subsequent fine grit classification device provides accurately samples pictures data.
Two classifier of preset fine granularity in a kind of specific embodiment of the embodiment of the present invention, in the step (2) Training process are as follows:
One test pictures are provided, the test pictures are clustered according to cluster algorithm, obtain positive sample It tests pictures and negative sample tests pictures;Preferably, the cluster algorithm is K-means algorithm, is specifically clustered Journey are as follows: the image data collection S201, is divided into k class, and is concentrated from the test picture and chooses k typical pictures as each The initial cluster center of class;S202, the initial cluster center for calculating test the picture each picture concentrated and every one kind Distance, and form initial cluster centre value according to minimum range, complete an iteration;S203, step S202 is repeated Iterative process obtains the cluster centre of every one kind until calculated cluster centre value is equal to former central value;S204, it calculates often Picture tests pictures at a distance from the cluster centre of every one kind, by positive sample is constituted apart from nearest picture, will be apart from most Remote picture constitutes negative sample collection, wherein the positive sample test pictures quantity is consistent with negative sample test pictures quantity;
Pictures are tested according to the positive sample and negative sample tests pictures, and training obtains two classifier of fine granularity.
It is described that the first kind pictures (A class picture) are input to preset two classifier of fine granularity, obtain each The confidence level of the class prediction of picture to be cleaned, for example the confidence level of the Class1 prediction in A class picture is obtained, and be sent into data Management system;
In a kind of specific embodiment of the embodiment of the present invention, the confidence threshold value based on setting and set with described The corresponding first picture number threshold value of confidence threshold filters out the picture for needing manually to be cleaned, specifically includes following son Step:
By the confidence threshold value of setting compared with the confidence level of the class prediction of each picture to be cleaned of acquisition;It is described The confidence threshold value of setting need to go to set according to the actual situation, for example can be set to 0.99;
When the picture number that the confidence level of prediction is less than the confidence threshold value of the setting is greater than preset first picture number Threshold value is measured, then picture regards as the picture for needing manually to be cleaned by the part;When the confidence level of prediction is less than the setting The picture number of confidence threshold value be less than preset first picture number threshold value, then by the part picture regard as not needing into The picture of pedestrian's work cleaning;The preset picture number threshold value is also required to go to set according to the actual situation, for example can set It is set to 150;
During the model optimization of two classifier of fine granularity, the confidence threshold value based on setting and set with described The corresponding first picture number threshold value of confidence threshold filters out after needing the picture step manually cleaned further include:
Rule based on setting picks out the picture for needing manually to be cleaned from each forecast confidence distributed area, It should be noted that the step only can just execute during the model optimization of two classifier of fine granularity;Such as it can be according to table 1 In setting rule carry out picture select:
Table one
Forecast confidence distribution Quantity is selected at random
0-20% 5
20%-40% 10
40%-60% 15
60%-80% 10
80%-100% 5
It is described to pick out needs from each forecast confidence distributed area in the other embodiments of the embodiment of the present invention The picture manually cleaned can not also be selected according to the rule of table one, go to determine with specific reference to actual needs specific Picking rule.
Preferably, the confidence threshold value based on setting and the first picture number corresponding with the confidence threshold value Threshold value is measured, is filtered out after needing the picture step manually cleaned further include:
In picture to be cleaned, the picture number that the confidence level of prediction is greater than the confidence threshold value of the setting is less than pre- If second picture amount threshold (such as this crawl altogether the 10% of picture total amount)), then by the confidence level threshold of the setting Value thresholdcleanIt turns down, then the image data higher than the confidence threshold value after the adjusting is provided again, manually cleaned.
In a kind of specific embodiment of the embodiment of the present invention, the step (4) is based on needing manually to clean picture The confidence level of class prediction and the feedback result manually cleaned obtain the model accuracy of two classifier of fine granularity, specifically include Following sub-step:
(401) for based on setting confidence threshold value and the first picture number corresponding with the confidence threshold value Threshold value filters out the picture for needing manually to be cleaned, for each picture to be cleaned, when the confidence level of its class prediction Conflict with artificial cleaning feedback result, then determine classification error, conversely, then determining that classification is correct;
(402) for picking out the picture for needing manually to be cleaned from each forecast confidence distributed area, for every One picture to be cleaned, when the confidence level of its class prediction conflicts with artificial cleaning feedback result, i.e. two classifier of fine granularity Obtained picture prediction confidence level does not conform to the actual conditions, then determines classification error, conversely, then determining that classification is correct;
(403) based on above-mentioned classification judging result, the model accuracy of two classifier of fine granularity is calculated.
In a kind of specific embodiment of the embodiment of the present invention, the model of two classifier of fine granularity in the step (5) Optimization, specifically includes following sub-step:
Fallibility sample is obtained, includes fallibility positive sample and fallibility negative sample in the fallibility sample;The positive sample of the fallibility Originally refer to that forecast confidence is less than the picture of the confidence threshold value of setting but actually belongs to such;The fallibility negative sample refers to Be forecast confidence be greater than setting confidence threshold value picture but be actually not belonging to such;
Accuracy threshold value is set when the model accuracy of two classifier of fine granularity is less than, and the optimization number of model is less than mould Type optimize frequency threshold value when, by the fallibility positive sample of acquisition, fallibility negative sample and other positive samples, negative sample together as Training set carrys out again two classifier of fine granularity, optimizes two classifier of fine granularity, at the same time, two classifier of fine granularity Model optimization number adds one.
Preferably, the confidence level for remembering the class prediction to manually clean picture is Confidencepredict, manually clean Feedback result is Confidencegroundtruth, the calculation formula of the fallibility sample are as follows:
|Confidencegroundtruth-Confidencepredict|>threshold
Wherein, ConfidencepredictValue range be (0,1), Confidencegroundtruth0 or 1 is taken, Threshold is preset threshold value.
Embodiment 2
Based on inventive concept same as Example 1, a kind of optimization automation picture number is provided in the embodiment of the present invention According to the device of cleaning quality, comprising:
First screening module for obtaining pictures to be cleaned, and is input in preset two classifier of coarseness, Filter out satisfactory first kind pictures;
First computing module obtains every for the first kind pictures to be input to preset two classifier of fine granularity The confidence level of the class prediction of one picture to be cleaned;
Second screening module, for the confidence threshold value and corresponding with the confidence threshold value first based on setting Picture number threshold value filters out the picture for needing manually to be cleaned;
Second computing module, for based on the class prediction for needing manually to be cleaned picture confidence level with manually clean Feedback result obtains the model accuracy of two classifier of fine granularity;
Optimization module, for using the model accuracy of two classifier of fine granularity and model optimization frequency threshold value as optimization item Part, confidence level, the manually feedback result cleaned and samples pictures based on the class prediction for needing manually to be cleaned picture into The model optimization of two classifier of row fine granularity.
Rest part is same as Example 1.
Embodiment 3
Based on inventive concept same as Example 1, a kind of optimization automation picture number is provided in the embodiment of the present invention According to the system of cleaning quality, comprising:
Processor is adapted for carrying out each instruction;And
Equipment is stored, is suitable for storing a plurality of instruction, described instruction is suitable for being loaded as processor and being executed described in embodiment 1 The step of.
Below by taking the cleaning of three kinds of vegetables (phoenix tail prawns, quick-fried squid, Scotch collops) picture as an example.
The accuracy rate of crawler data set is as follows:
Vegetable Total quantity Positive sample quantity Negative sample quantity Accuracy rate
Quick-fried squid 1610 1080 530 67.1%
Phoenix tail prawns 1716 936 780 54.5%
Scotch collops 1568 697 871 44.5%
Data set accuracy rate is as follows after automatic cleaning in the prior art:
After handling by means of the present invention, the accuracy rate of data set is as follows:
After can be seen that processing by means of the present invention based on table 1-3, the predictablity rate of data set is not relative to having There is further promotion before optimization.
It should be understood by those skilled in the art that, embodiments herein can provide as method, system or computer program Product.Therefore, complete hardware embodiment, complete software embodiment or reality combining software and hardware aspects can be used in the application Apply the form of example.Moreover, it wherein includes the computer of computer usable program code that the application, which can be used in one or more, The computer program implemented in usable storage medium (including but not limited to magnetic disk storage, CD-ROM, optical memory etc.) produces The form of product.
The application is referring to method, the process of equipment (system) and computer program product according to the embodiment of the present application Figure and/or block diagram describe.It should be understood that every one stream in flowchart and/or the block diagram can be realized by computer program instructions The combination of process and/or box in journey and/or box and flowchart and/or the block diagram.It can provide these computer programs Instruct the processor of general purpose computer, special purpose computer, Embedded Processor or other programmable data processing devices to produce A raw machine, so that being generated by the instruction that computer or the processor of other programmable data processing devices execute for real The device for the function of being specified in present one or more flows of the flowchart and/or one or more blocks of the block diagram.
These computer program instructions, which may also be stored in, is able to guide computer or other programmable data processing devices with spy Determine in the computer-readable memory that mode works, so that it includes referring to that instruction stored in the computer readable memory, which generates, Enable the manufacture of device, the command device realize in one box of one or more flows of the flowchart and/or block diagram or The function of being specified in multiple boxes.
These computer program instructions also can be loaded onto a computer or other programmable data processing device, so that counting Series of operation steps are executed on calculation machine or other programmable devices to generate computer implemented processing, thus in computer or The instruction executed on other programmable devices is provided for realizing in one or more flows of the flowchart and/or block diagram one The step of function of being specified in a box or multiple boxes.
The embodiment of the present invention is described in conjunction with attached drawing above, but the invention is not limited to above-mentioned specific Embodiment, the above mentioned embodiment is only schematical, rather than restrictive, those skilled in the art Under the inspiration of the present invention, without breaking away from the scope protected by the purposes and claims of the present invention, it can also make very much Form, all of these belong to the protection of the present invention.
The above shows and describes the basic principles and main features of the present invention and the advantages of the present invention.The technology of the industry Personnel are it should be appreciated that the present invention is not limited to the above embodiments, and the above embodiments and description only describe this The principle of invention, without departing from the spirit and scope of the present invention, various changes and improvements may be made to the invention, these changes Change and improvement all fall within the protetion scope of the claimed invention.The claimed scope of the invention by appended claims and its Equivalent thereof.

Claims (10)

1. a kind of method of optimization automation image data cleaning quality, which comprises the following steps:
Pictures to be cleaned are obtained, and are input in preset two classifier of coarseness, filter out satisfactory first Class pictures;
The first kind pictures are input to preset two classifier of fine granularity, the classification for obtaining each picture to be cleaned is pre- The confidence level of survey;
Confidence threshold value and the first picture number threshold value corresponding with the confidence threshold value based on setting, filtering out needs The picture manually cleaned;
Confidence level based on the class prediction for needing manually to be cleaned picture and the feedback result manually cleaned, obtain fine granularity two The model accuracy of classifier;
It is artificial based on needing using the model accuracy of two classifier of fine granularity and model optimization frequency threshold value as optimal conditions Confidence level, the feedback result manually cleaned and the samples pictures for cleaning the class prediction of picture carry out two classifier of fine granularity Model optimization;
It repeats the above process, until obtaining two classifier of fine granularity met the requirements, completes the cleaning of all pictures.
2. a kind of method of optimization automation image data cleaning quality according to claim 1, it is characterised in that: described The training process of preset two classifier of fine granularity are as follows:
One test pictures are provided, the test pictures are clustered according to cluster algorithm, obtain positive sample test Pictures and negative sample test pictures;
Pictures are tested according to the positive sample and negative sample tests pictures, and training obtains two classifier of fine granularity.
3. a kind of method of optimization automation image data cleaning quality according to claim 1, it is characterised in that: described Confidence threshold value and the first picture number threshold value corresponding with the confidence threshold value based on setting, filter out need into The picture of pedestrian's work cleaning, specifically includes following sub-step:
By the confidence threshold value of setting compared with the confidence level of the class prediction of each picture to be cleaned of acquisition;
When the picture number that the confidence level of prediction is less than the confidence threshold value of the setting is greater than preset first picture number threshold Value, then by the part, picture regards as the picture for needing manually to be cleaned.
4. a kind of method of optimization automation image data cleaning quality according to claim 3, it is characterised in that: thin During the model optimization of two classifier of granularity, the confidence threshold value based on setting and opposite with the confidence threshold value The the first picture number threshold value answered filters out after needing the picture step manually cleaned further include:
Rule based on setting picks out the picture for needing manually to be cleaned from each forecast confidence distributed area.
5. a kind of method of optimization automation image data cleaning quality according to claim 4, it is characterised in that: described Confidence level based on the class prediction for needing manually to be cleaned picture and the feedback result manually cleaned, obtain fine granularity two and classify The model accuracy of device specifically includes following sub-step:
For confidence threshold value and the first picture number threshold value corresponding with the confidence threshold value based on setting, screening The picture for needing manually to be cleaned out, for each picture to be cleaned, when the confidence level of its class prediction and artificial clear Feedback result conflict is washed, then determines classification error, conversely, then determining that classification is correct;
It is to be cleaned for each for picking out the picture for needing manually to be cleaned from each forecast confidence distributed area Picture classification error is then determined, conversely, then determining when the confidence level of its class prediction conflicts with artificial cleaning feedback result Classification is correct;
Based on above-mentioned classification judging result, the model accuracy of two classifier of fine granularity is calculated.
6. a kind of method of optimization automation image data cleaning quality according to claim 1, it is characterised in that: described Confidence threshold value and the first picture number threshold value corresponding with the confidence threshold value based on setting, filter out need into After the picture step of pedestrian's work cleaning further include:
In picture to be cleaned, the picture number that the confidence level of prediction is greater than the confidence threshold value of the setting is less than preset Second picture amount threshold, then the confidence threshold value by the setting is turned down, and is then provided again higher than the confidence level after the adjusting The image data of threshold value, is manually cleaned.
7. a kind of method of optimization automation image data cleaning quality according to claim 1, it is characterised in that: described The model optimization of two classifier of fine granularity, specifically includes following sub-step:
Fallibility sample is obtained, includes fallibility positive sample and fallibility negative sample in the fallibility sample;
Accuracy threshold value is set when the model accuracy of two classifier of fine granularity is less than, and the optimization number of model is excellent less than model When changing frequency threshold value, by the fallibility positive sample of acquisition, fallibility negative sample and other positive samples, negative sample together as training Collection carrys out two classifier of fine granularity again, optimizes two classifier of fine granularity, at the same time, the model of two classifier of fine granularity Optimization number adds one.
8. a kind of method of optimization automation image data cleaning quality according to claim 7, it is characterised in that:
The confidence level for remembering the class prediction to manually clean picture is Confidencepredict, the feedback result manually cleaned is Confidencegroundtruth, the calculation formula of the fallibility sample are as follows:
|Confidencegroundtruth-Confidencepredict|>threshold
Wherein, ConfidencepredictValue range be (0,1), Confidencegroundtruth0 or 1, the threshold is taken to be Preset threshold value.
9. a kind of device of optimization automation image data cleaning quality characterized by comprising
First screening module for obtaining pictures to be cleaned, and is input in preset two classifier of coarseness, screening Satisfactory first kind pictures out;
First computing module obtains each for the first kind pictures to be input to preset two classifier of fine granularity The confidence level of the class prediction of picture to be cleaned;
Second screening module, for based on setting confidence threshold value and the first picture corresponding with the confidence threshold value Amount threshold filters out the picture for needing manually to be cleaned;
Second computing module, for based on the feedback for needing the manually confidence level of the class prediction of cleaning picture and manually cleaning As a result, obtaining the model accuracy of two classifier of fine granularity;
Optimization module, for using the model accuracy of two classifier of fine granularity and model optimization frequency threshold value as optimal conditions, Confidence level, the manually feedback result cleaned and samples pictures based on the class prediction for needing manually to be cleaned picture carry out thin The model optimization of two classifier of granularity.
10. a kind of system of optimization automation image data cleaning quality, it is characterised in that: include:
Processor is adapted for carrying out each instruction;And
Equipment is stored, is suitable for storing a plurality of instruction, described instruction is suitable for by processor load and perform claim requires to appoint in 1~8 Step described in one.
CN201910267802.1A 2019-04-03 2019-04-03 Method, device and system for optimizing automatic picture data cleaning quality Active CN110083728B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910267802.1A CN110083728B (en) 2019-04-03 2019-04-03 Method, device and system for optimizing automatic picture data cleaning quality

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910267802.1A CN110083728B (en) 2019-04-03 2019-04-03 Method, device and system for optimizing automatic picture data cleaning quality

Publications (2)

Publication Number Publication Date
CN110083728A true CN110083728A (en) 2019-08-02
CN110083728B CN110083728B (en) 2021-08-20

Family

ID=67414238

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910267802.1A Active CN110083728B (en) 2019-04-03 2019-04-03 Method, device and system for optimizing automatic picture data cleaning quality

Country Status (1)

Country Link
CN (1) CN110083728B (en)

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111667003A (en) * 2020-06-05 2020-09-15 北京百度网讯科技有限公司 Data cleaning method, device, equipment and storage medium
CN112418169A (en) * 2020-12-10 2021-02-26 上海芯翌智能科技有限公司 Method and equipment for processing human body attribute data
CN112529851A (en) * 2020-11-27 2021-03-19 中冶赛迪重庆信息技术有限公司 Method, system, terminal and medium for determining state of hydraulic pipe
CN112633320A (en) * 2020-11-26 2021-04-09 西安电子科技大学 Radar radiation source data cleaning method based on phase image coefficient and DBSCAN
CN113344098A (en) * 2021-06-22 2021-09-03 北京三快在线科技有限公司 Model training method and device
CN114495291A (en) * 2022-04-01 2022-05-13 杭州魔点科技有限公司 Method, system, electronic device and storage medium for in vivo detection
CN118332343A (en) * 2024-06-13 2024-07-12 健数(长春)科技有限公司 Blood routine-based semi-supervised model optimized pulmonary tuberculosis disease classification method and system

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20130128981A1 (en) * 2010-07-15 2013-05-23 Fujitsu Limited Moving image decoding apparatus, moving image decoding method and moving image encoding apparatus, and moving image encoding method
CN107977412A (en) * 2017-11-22 2018-05-01 上海大学 It is a kind of based on iterative with interactive perceived age database cleaning method
CN108664497A (en) * 2017-03-30 2018-10-16 大有秦鼎(北京)科技有限公司 The method and apparatus of Data Matching
CN108874900A (en) * 2018-05-24 2018-11-23 四川斐讯信息技术有限公司 A kind of acquisition methods and system of samples pictures data acquisition system
CN108875821A (en) * 2018-06-08 2018-11-23 Oppo广东移动通信有限公司 The training method and device of disaggregated model, mobile terminal, readable storage medium storing program for executing
CN109165665A (en) * 2018-07-06 2019-01-08 上海康斐信息技术有限公司 A kind of category analysis method and system
CN109241903A (en) * 2018-08-30 2019-01-18 平安科技(深圳)有限公司 Sample data cleaning method, device, computer equipment and storage medium
CN109241397A (en) * 2018-07-06 2019-01-18 四川斐讯信息技术有限公司 A kind of method and apparatus for cleaning data

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20130128981A1 (en) * 2010-07-15 2013-05-23 Fujitsu Limited Moving image decoding apparatus, moving image decoding method and moving image encoding apparatus, and moving image encoding method
CN108664497A (en) * 2017-03-30 2018-10-16 大有秦鼎(北京)科技有限公司 The method and apparatus of Data Matching
CN107977412A (en) * 2017-11-22 2018-05-01 上海大学 It is a kind of based on iterative with interactive perceived age database cleaning method
CN108874900A (en) * 2018-05-24 2018-11-23 四川斐讯信息技术有限公司 A kind of acquisition methods and system of samples pictures data acquisition system
CN108875821A (en) * 2018-06-08 2018-11-23 Oppo广东移动通信有限公司 The training method and device of disaggregated model, mobile terminal, readable storage medium storing program for executing
CN109165665A (en) * 2018-07-06 2019-01-08 上海康斐信息技术有限公司 A kind of category analysis method and system
CN109241397A (en) * 2018-07-06 2019-01-18 四川斐讯信息技术有限公司 A kind of method and apparatus for cleaning data
CN109241903A (en) * 2018-08-30 2019-01-18 平安科技(深圳)有限公司 Sample data cleaning method, device, computer equipment and storage medium

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
陈锐: "基于神经网络的图像分类方法研究", 《中国优秀硕士学位论文全文数据库 信息科技辑》 *

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111667003A (en) * 2020-06-05 2020-09-15 北京百度网讯科技有限公司 Data cleaning method, device, equipment and storage medium
CN111667003B (en) * 2020-06-05 2023-11-03 北京百度网讯科技有限公司 Data cleaning method, device, equipment and storage medium
CN112633320A (en) * 2020-11-26 2021-04-09 西安电子科技大学 Radar radiation source data cleaning method based on phase image coefficient and DBSCAN
CN112529851A (en) * 2020-11-27 2021-03-19 中冶赛迪重庆信息技术有限公司 Method, system, terminal and medium for determining state of hydraulic pipe
CN112418169A (en) * 2020-12-10 2021-02-26 上海芯翌智能科技有限公司 Method and equipment for processing human body attribute data
CN113344098A (en) * 2021-06-22 2021-09-03 北京三快在线科技有限公司 Model training method and device
CN114495291A (en) * 2022-04-01 2022-05-13 杭州魔点科技有限公司 Method, system, electronic device and storage medium for in vivo detection
CN118332343A (en) * 2024-06-13 2024-07-12 健数(长春)科技有限公司 Blood routine-based semi-supervised model optimized pulmonary tuberculosis disease classification method and system
CN118332343B (en) * 2024-06-13 2024-08-16 健数(长春)科技有限公司 Blood routine-based semi-supervised model optimized pulmonary tuberculosis disease classification method and system

Also Published As

Publication number Publication date
CN110083728B (en) 2021-08-20

Similar Documents

Publication Publication Date Title
CN110083728A (en) A kind of methods, devices and systems of optimization automation image data cleaning quality
Chen et al. A light-weighted CNN model for wafer structural defect detection
Yuan-Fu A deep learning model for identification of defect patterns in semiconductor wafer map
CN109948647A (en) A kind of electrocardiogram classification method and system based on depth residual error network
CN108090508A (en) A kind of classification based training method, apparatus and storage medium
CN109271374A (en) A kind of database health scoring method and scoring system based on machine learning
CN109598307A (en) Data screening method, apparatus, server and storage medium
CN106446931A (en) Feature extraction and classification method and system based on support vector data description
Liu et al. A classification method of glass defect based on multiresolution and information fusion
CN111145145B (en) Image surface defect detection method based on MobileNet
CN110008853A (en) Pedestrian detection network and model training method, detection method, medium, equipment
CN116521908B (en) Multimedia content personalized recommendation method based on artificial intelligence
CN109543693A (en) Weak labeling data noise reduction method based on regularization label propagation
CN106203103A (en) The method for detecting virus of file and device
Lin et al. Parameter determination and feature selection for back-propagation network by particle swarm optimization
CN110019563A (en) A kind of portrait modeling method and device based on multidimensional data
CN110262887A (en) CPU-FPGA method for scheduling task and device based on feature identification
CN110458189A (en) Compressed sensing and depth convolutional neural networks Power Quality Disturbance Classification Method
Xu et al. Comparison of shape features for the classification of wear particles
CN110009005A (en) A kind of net flow assorted method based on feature strong correlation
CN112420125A (en) Molecular attribute prediction method and device, intelligent equipment and terminal
CN116380438A (en) Fault diagnosis method and device, electronic equipment and storage medium
WO2019032413A1 (en) Inspection-guided critical site selection for critical dimension measurement
Lallich et al. Improving classification by removing or relabeling mislabeled instances
CN114064459A (en) Software defect prediction method based on generation countermeasure network and ensemble learning

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
TA01 Transfer of patent application right

Effective date of registration: 20201012

Address after: 201615 room 1001, building 21, No. 1158, Zhongxin Road, Jiuting Town, Songjiang District, Shanghai

Applicant after: Shanghai re SR Information Technology Co.,Ltd.

Address before: The new town of Pudong New Area Nanhui lake west two road 201306 Shanghai City No. 888 building C

Applicant before: Shanghai Lianyin Electronic Technology Partnership (L.P.)

TA01 Transfer of patent application right
GR01 Patent grant
GR01 Patent grant