CN110083728A - A kind of methods, devices and systems of optimization automation image data cleaning quality - Google Patents
A kind of methods, devices and systems of optimization automation image data cleaning quality Download PDFInfo
- Publication number
- CN110083728A CN110083728A CN201910267802.1A CN201910267802A CN110083728A CN 110083728 A CN110083728 A CN 110083728A CN 201910267802 A CN201910267802 A CN 201910267802A CN 110083728 A CN110083728 A CN 110083728A
- Authority
- CN
- China
- Prior art keywords
- picture
- threshold value
- cleaned
- confidence
- classifier
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/50—Information retrieval; Database structures therefor; File system structures therefor of still image data
- G06F16/55—Clustering; Classification
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Artificial Intelligence (AREA)
- Life Sciences & Earth Sciences (AREA)
- Databases & Information Systems (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Evolutionary Biology (AREA)
- Evolutionary Computation (AREA)
- Image Analysis (AREA)
- Image Processing (AREA)
Abstract
The invention discloses a kind of methods, devices and systems of optimization automation image data cleaning quality, comprising: pictures to be cleaned are sequentially input to two classifier of two classifier of coarseness and fine granularity, obtain the confidence level of the class prediction of image data to be cleaned;Confidence threshold value and the first picture number threshold value corresponding with confidence threshold value based on setting filter out the picture for needing manually to be cleaned;Confidence level based on the class prediction for needing manually to be cleaned picture and the feedback result manually cleaned, obtain the model accuracy of two classifier of fine granularity;Using the model accuracy of two classifier of fine granularity and model optimization frequency threshold value as optimal conditions, the model optimization of two classifier of fine granularity is carried out.The present invention can obtain very high image cleaning quality through too small amount of two sorter model iteration of fine granularity on the basis of legacy data cleaning method, in some cases even can the artificial cleaning of substitution completely after the completion of model iteration.
Description
Technical field
The invention belongs to technical field of image processing, and in particular to a kind of side of optimization automation image data cleaning quality
Method, device and system.
Background technique
The breakthrough obtained in field of image recognition with depth learning technology, neural network have become mainstream
Field of image recognition apply algorithm.But neural network is a kind of supervised learning algorithm, so-called supervised learning refers to research and development
Person uses given data collection, allows neural network to be learnt based on the data that output and input of label, to continue to optimize itself
Model parameter, allow oneself constantly " clever ", that is, the image data with accurate label of magnanimity needed to train and could obtain
To good recognition accuracy.It is higher for the accuracy rate of the data more multi-model of study for theoretically.But this optimal shape
Condition, be built upon data for study it is all quite right in the case where, if wherein mixed wrong data, then learning the standard obtained
True rate will obviously be affected.Therefore the bottleneck that the cleaning of mass picture data becomes restriction nerual network technique development is asked
Topic.Industry master image data cleaning way to be used is still based on the traditional approach manually cleaned at present.
Number of patent application is 2018107215159, and patent name is a kind of middle promulgated by the State Council of method and device for cleaning data
It is disclosed in bright patent application: during data cleansing, first choosing maximum probability in data to be cleaned and be determined as correct number
According to the data with mistake, the data that centre has some comparisons to be difficult to confirm are screened again, then pick out positive sample and negative sample,
Although the accuracy rate of data set can be improved, effect in practical engineering applications is not fine, and its cleaning quality is base
This fixation, more good data cleansing quality can not be obtained.
Summary of the invention
In view of the above-mentioned problems, the present invention proposes that a kind of optimization automates the method, apparatus of image data cleaning quality and is
System can obtain on the basis of legacy data cleaning method through too small amount of two sorter model iteration of fine granularity very high
Image cleaning quality, in some cases even can the artificial cleaning of substitution completely after the completion of model iteration.
In order to achieve the above technical purposes, reach above-mentioned technical effect, the invention is realized by the following technical scheme:
In a first aspect, the present invention provides a kind of method of optimization automation image data cleaning quality, including following step
It is rapid:
Pictures to be cleaned are obtained, and are input in preset two classifier of coarseness, are filtered out satisfactory
First kind pictures;
The first kind pictures are input to preset two classifier of fine granularity, obtain the class of each picture to be cleaned
The confidence level that do not predict;
Confidence threshold value and the first picture number threshold value corresponding with the confidence threshold value based on setting, screening
The picture for needing manually to be cleaned out;
Confidence level based on the class prediction for needing manually to be cleaned picture and the feedback result manually cleaned, obtain particulate
Spend the model accuracy of two classifiers;
Using the model accuracy of two classifier of fine granularity and model optimization frequency threshold value as optimal conditions, based on being needed
The confidence level of the class prediction of artificial cleaning picture, the feedback result manually cleaned and samples pictures carry out fine granularity two and classify
The model optimization of device;
It repeats the above process, until obtaining two classifier of fine granularity met the requirements, completes the cleaning of all pictures.
Preferably, the training process of two classifier of preset fine granularity are as follows:
One test pictures are provided, the test pictures are clustered according to cluster algorithm, obtain positive sample
It tests pictures and negative sample tests pictures;
Pictures are tested according to the positive sample and negative sample tests pictures, and training obtains two classifier of fine granularity.
Preferably, the confidence threshold value based on setting and the first picture number corresponding with the confidence threshold value
Threshold value is measured, the picture for needing manually to be cleaned is filtered out, specifically includes following sub-step:
By the confidence threshold value of setting compared with the confidence level of the class prediction of each picture to be cleaned of acquisition;
When the picture number that the confidence level of prediction is less than the confidence threshold value of the setting is greater than preset first picture number
Threshold value is measured, then picture regards as the picture for needing manually to be cleaned by the part.
Preferably, during the model optimization of two classifier of fine granularity, the confidence threshold value based on setting and
The first picture number threshold value corresponding with the confidence threshold value filters out after needing the picture step manually cleaned
Further include:
Rule based on setting picks out the picture for needing manually to be cleaned from each forecast confidence distributed area.
Preferably, the confidence level based on the class prediction for needing manually to be cleaned picture and the feedback knot manually cleaned
Fruit obtains the model accuracy of two classifier of fine granularity, specifically includes following sub-step:
For confidence threshold value and the first picture number threshold value corresponding with the confidence threshold value based on setting,
Filter out the picture for needing manually to be cleaned, confidence level and people for each picture to be cleaned, when its class prediction
Work cleans feedback result conflict, then determines classification error, conversely, then determining that classification is correct;
Need the picture that is manually cleaned for picking out from each forecast confidence distributed area, for each to
The picture of cleaning then determines classification error, conversely, then when the confidence level of its class prediction conflicts with artificial cleaning feedback result
Determine that classification is correct;
Based on above-mentioned classification judging result, the model accuracy of two classifier of fine granularity is calculated.
Preferably, the confidence threshold value based on setting and the first picture number corresponding with the confidence threshold value
Threshold value is measured, is filtered out after needing the picture step manually cleaned further include:
In picture to be cleaned, the picture number that the confidence level of prediction is greater than the confidence threshold value of the setting is less than pre-
If second picture amount threshold, then the confidence threshold value by the setting is turned down, setting after then being provided again higher than the adjusting
The image data of confidence threshold, is manually cleaned.
Preferably, the model optimization of two classifier of fine granularity, specifically includes following sub-step:
Fallibility sample is obtained, includes fallibility positive sample and fallibility negative sample in the fallibility sample;
Accuracy threshold value is set when the model accuracy of two classifier of fine granularity is less than, and the optimization number of model is less than mould
Type optimize frequency threshold value when, by the fallibility positive sample of acquisition, fallibility negative sample and other positive samples, negative sample together as
Training set carrys out again two classifier of fine granularity, optimizes two classifier of fine granularity, at the same time, two classifier of fine granularity
Model optimization number adds one.
Preferably, the confidence level for remembering the class prediction to manually clean picture is Confidencepredict, manually clean
Feedback result is Confidencegroundtruth, the calculation formula of the fallibility sample are as follows:
|Confidencegroundtruth-Confidencepredict|>threshold
Wherein, ConfidencepredictValue range be (0,1), Confidencegroundtruth0 or 1 is taken,
Threshold is preset threshold value.
Second aspect, the present invention provides a kind of devices of optimization automation image data cleaning quality, comprising:
First screening module for obtaining pictures to be cleaned, and is input in preset two classifier of coarseness,
Filter out satisfactory first kind pictures;
First computing module obtains every for the first kind pictures to be input to preset two classifier of fine granularity
The confidence level of the class prediction of one picture to be cleaned;
Second screening module, for the confidence threshold value and corresponding with the confidence threshold value first based on setting
Picture number threshold value filters out the picture for needing manually to be cleaned;
Second computing module, for based on the class prediction for needing manually to be cleaned picture confidence level with manually clean
Feedback result obtains the model accuracy of two classifier of fine granularity;
Optimization module, for using the model accuracy of two classifier of fine granularity and model optimization frequency threshold value as optimization item
Part, confidence level, the manually feedback result cleaned and samples pictures based on the class prediction for needing manually to be cleaned picture into
The model optimization of two classifier of row fine granularity.
The third aspect, the present invention provides a kind of systems of optimization automation image data cleaning quality, comprising:
Processor is adapted for carrying out each instruction;And
Equipment is stored, is suitable for storing a plurality of instruction, described instruction is suitable for being loaded by processor and executing institute in first aspect
The step of stating.
Compared with prior art, beneficial effects of the present invention:
The method, apparatus and system of optimization automation image data cleaning quality of the invention, can be clear in legacy data
Very high image cleaning quality is obtained through too small amount of two sorter model iteration of fine granularity on the basis of washing method, certain
In the case of even can the artificial cleaning of substitution completely after the completion of model iteration.
Detailed description of the invention
Fig. 1 is the flow diagram of the method for the optimization automation image data cleaning quality of an embodiment of the present invention.
Specific embodiment
In order to make the objectives, technical solutions, and advantages of the present invention clearer, with reference to embodiments, to the present invention
It is further elaborated.It should be appreciated that the specific embodiments described herein are merely illustrative of the present invention, it is not used to
It limits the scope of protection of the present invention.
Application principle of the invention is explained in detail with reference to the accompanying drawing.
Neural network is allowed to be learnt based on the data that output and input of label in order to realize, to continue to optimize itself
Model parameter allows oneself continuous " clever ", it is necessary to which the image data training with accurate label for providing magnanimity could obtain
To good recognition accuracy.It is higher for the accuracy rate of the data more multi-model of study for theoretically.But this optimal shape
Condition, be built upon data for study it is all quite right in the case where, if wherein mixed wrong data, then learning the standard obtained
True rate will obviously be affected.Therefore the bottleneck that the cleaning of mass picture data becomes restriction nerual network technique development is asked
Topic.Industry master image data cleaning way to be used is still based on the traditional approach manually cleaned at present, and artificial screening is not
Only heavy workload, and the data obtained cause the similarity of data voluntarily judge by user and divided due to artificial subjectivity
Class, cause neural network model may mistake data and influence the performance of neural network model.For this purpose, the present invention provides one
The method, apparatus and system of kind optimization automation image data cleaning quality, can be on the basis of legacy data cleaning method
Very high image cleaning quality is obtained through too small amount of two sorter model iteration of fine granularity, it in some cases even can be
The artificial cleaning of substitution completely after the completion of model iteration.
Embodiment 1
As shown in Figure 1, the embodiment of the invention provides a kind of methods of optimization automation image data cleaning quality, including
Following steps:
(1) pictures to be cleaned are obtained, and are input in preset two classifier of coarseness, filters out and meets the requirements
First kind pictures;
(2) first kind pictures are input to preset two classifier of fine granularity, obtain each picture to be cleaned
Class prediction confidence level;The step (1) and (2) and the automatic wash phase corresponded in Fig. 1;
(3) confidence threshold value based on setting and the first picture number threshold value corresponding with the confidence threshold value,
Filter out the picture for needing manually to be cleaned;The step (3) corresponds to the artificial wash phase in Fig. 1;
(4) confidence level based on the class prediction for needing manually to be cleaned picture and the feedback result manually cleaned, obtain
The model accuracy of two classifier of fine granularity;
(5) using the model accuracy of two classifier of fine granularity and model optimization frequency threshold value as optimal conditions, it is based on institute
Confidence level, the feedback result manually cleaned and the samples pictures for needing manually to be cleaned the class prediction of picture carry out fine granularity two
The model optimization of classifier;The step (5) corresponds to the model optimization stage in Fig. 1;
(6) it repeats the above process, until obtaining two classifier of fine granularity met the requirements, completes the cleaning of all pictures.
In a kind of specific embodiment of the embodiment of the present invention, the step (1) specifically:
Using web crawlers, initial testing pictures are obtained, according to preset two classifier of coarseness, to described initial
Test pictures are trained, and obtain the test pictures.For great amount of images sample needed for acquisition training neural network model
Notebook data, most convenient and fast mode are the methods obtained by web crawlers, and web crawlers can will meet according to the condition of setting
The information of the condition is crawled out from the massive information of internet, but the pictorial information that web crawlers obtains is magnanimity,
And many pictorial informations are unwanted.Assuming that obtaining the picture concerned data that classification is A class by web crawlers, knot is crawled
Fruit often obtains the image data of big non-A class, therefore by two classifier of coarseness, the mass picture obtained to crawler network
Data carry out preliminary classification, reject the image data of non-A class, obtain the image data of A class.For example, being obtained by web crawlers
The related vegetable picture of Tomato omelette, often crawl not be Tomato omelette vegetable picture, pass through coarseness
Two classifiers get the vegetable picture of Tomato omelette.According to the technical solution, by two classifier of coarseness, to sea
The image data of amount does preliminary classification, and the training for subsequent fine grit classification device provides accurately samples pictures data.
Two classifier of preset fine granularity in a kind of specific embodiment of the embodiment of the present invention, in the step (2)
Training process are as follows:
One test pictures are provided, the test pictures are clustered according to cluster algorithm, obtain positive sample
It tests pictures and negative sample tests pictures;Preferably, the cluster algorithm is K-means algorithm, is specifically clustered
Journey are as follows: the image data collection S201, is divided into k class, and is concentrated from the test picture and chooses k typical pictures as each
The initial cluster center of class;S202, the initial cluster center for calculating test the picture each picture concentrated and every one kind
Distance, and form initial cluster centre value according to minimum range, complete an iteration;S203, step S202 is repeated
Iterative process obtains the cluster centre of every one kind until calculated cluster centre value is equal to former central value;S204, it calculates often
Picture tests pictures at a distance from the cluster centre of every one kind, by positive sample is constituted apart from nearest picture, will be apart from most
Remote picture constitutes negative sample collection, wherein the positive sample test pictures quantity is consistent with negative sample test pictures quantity;
Pictures are tested according to the positive sample and negative sample tests pictures, and training obtains two classifier of fine granularity.
It is described that the first kind pictures (A class picture) are input to preset two classifier of fine granularity, obtain each
The confidence level of the class prediction of picture to be cleaned, for example the confidence level of the Class1 prediction in A class picture is obtained, and be sent into data
Management system;
In a kind of specific embodiment of the embodiment of the present invention, the confidence threshold value based on setting and set with described
The corresponding first picture number threshold value of confidence threshold filters out the picture for needing manually to be cleaned, specifically includes following son
Step:
By the confidence threshold value of setting compared with the confidence level of the class prediction of each picture to be cleaned of acquisition;It is described
The confidence threshold value of setting need to go to set according to the actual situation, for example can be set to 0.99;
When the picture number that the confidence level of prediction is less than the confidence threshold value of the setting is greater than preset first picture number
Threshold value is measured, then picture regards as the picture for needing manually to be cleaned by the part;When the confidence level of prediction is less than the setting
The picture number of confidence threshold value be less than preset first picture number threshold value, then by the part picture regard as not needing into
The picture of pedestrian's work cleaning;The preset picture number threshold value is also required to go to set according to the actual situation, for example can set
It is set to 150;
During the model optimization of two classifier of fine granularity, the confidence threshold value based on setting and set with described
The corresponding first picture number threshold value of confidence threshold filters out after needing the picture step manually cleaned further include:
Rule based on setting picks out the picture for needing manually to be cleaned from each forecast confidence distributed area,
It should be noted that the step only can just execute during the model optimization of two classifier of fine granularity;Such as it can be according to table 1
In setting rule carry out picture select:
Table one
Forecast confidence distribution | Quantity is selected at random |
0-20% | 5 |
20%-40% | 10 |
40%-60% | 15 |
60%-80% | 10 |
80%-100% | 5 |
It is described to pick out needs from each forecast confidence distributed area in the other embodiments of the embodiment of the present invention
The picture manually cleaned can not also be selected according to the rule of table one, go to determine with specific reference to actual needs specific
Picking rule.
Preferably, the confidence threshold value based on setting and the first picture number corresponding with the confidence threshold value
Threshold value is measured, is filtered out after needing the picture step manually cleaned further include:
In picture to be cleaned, the picture number that the confidence level of prediction is greater than the confidence threshold value of the setting is less than pre-
If second picture amount threshold (such as this crawl altogether the 10% of picture total amount)), then by the confidence level threshold of the setting
Value thresholdcleanIt turns down, then the image data higher than the confidence threshold value after the adjusting is provided again, manually cleaned.
In a kind of specific embodiment of the embodiment of the present invention, the step (4) is based on needing manually to clean picture
The confidence level of class prediction and the feedback result manually cleaned obtain the model accuracy of two classifier of fine granularity, specifically include
Following sub-step:
(401) for based on setting confidence threshold value and the first picture number corresponding with the confidence threshold value
Threshold value filters out the picture for needing manually to be cleaned, for each picture to be cleaned, when the confidence level of its class prediction
Conflict with artificial cleaning feedback result, then determine classification error, conversely, then determining that classification is correct;
(402) for picking out the picture for needing manually to be cleaned from each forecast confidence distributed area, for every
One picture to be cleaned, when the confidence level of its class prediction conflicts with artificial cleaning feedback result, i.e. two classifier of fine granularity
Obtained picture prediction confidence level does not conform to the actual conditions, then determines classification error, conversely, then determining that classification is correct;
(403) based on above-mentioned classification judging result, the model accuracy of two classifier of fine granularity is calculated.
In a kind of specific embodiment of the embodiment of the present invention, the model of two classifier of fine granularity in the step (5)
Optimization, specifically includes following sub-step:
Fallibility sample is obtained, includes fallibility positive sample and fallibility negative sample in the fallibility sample;The positive sample of the fallibility
Originally refer to that forecast confidence is less than the picture of the confidence threshold value of setting but actually belongs to such;The fallibility negative sample refers to
Be forecast confidence be greater than setting confidence threshold value picture but be actually not belonging to such;
Accuracy threshold value is set when the model accuracy of two classifier of fine granularity is less than, and the optimization number of model is less than mould
Type optimize frequency threshold value when, by the fallibility positive sample of acquisition, fallibility negative sample and other positive samples, negative sample together as
Training set carrys out again two classifier of fine granularity, optimizes two classifier of fine granularity, at the same time, two classifier of fine granularity
Model optimization number adds one.
Preferably, the confidence level for remembering the class prediction to manually clean picture is Confidencepredict, manually clean
Feedback result is Confidencegroundtruth, the calculation formula of the fallibility sample are as follows:
|Confidencegroundtruth-Confidencepredict|>threshold
Wherein, ConfidencepredictValue range be (0,1), Confidencegroundtruth0 or 1 is taken,
Threshold is preset threshold value.
Embodiment 2
Based on inventive concept same as Example 1, a kind of optimization automation picture number is provided in the embodiment of the present invention
According to the device of cleaning quality, comprising:
First screening module for obtaining pictures to be cleaned, and is input in preset two classifier of coarseness,
Filter out satisfactory first kind pictures;
First computing module obtains every for the first kind pictures to be input to preset two classifier of fine granularity
The confidence level of the class prediction of one picture to be cleaned;
Second screening module, for the confidence threshold value and corresponding with the confidence threshold value first based on setting
Picture number threshold value filters out the picture for needing manually to be cleaned;
Second computing module, for based on the class prediction for needing manually to be cleaned picture confidence level with manually clean
Feedback result obtains the model accuracy of two classifier of fine granularity;
Optimization module, for using the model accuracy of two classifier of fine granularity and model optimization frequency threshold value as optimization item
Part, confidence level, the manually feedback result cleaned and samples pictures based on the class prediction for needing manually to be cleaned picture into
The model optimization of two classifier of row fine granularity.
Rest part is same as Example 1.
Embodiment 3
Based on inventive concept same as Example 1, a kind of optimization automation picture number is provided in the embodiment of the present invention
According to the system of cleaning quality, comprising:
Processor is adapted for carrying out each instruction;And
Equipment is stored, is suitable for storing a plurality of instruction, described instruction is suitable for being loaded as processor and being executed described in embodiment 1
The step of.
Below by taking the cleaning of three kinds of vegetables (phoenix tail prawns, quick-fried squid, Scotch collops) picture as an example.
The accuracy rate of crawler data set is as follows:
Vegetable | Total quantity | Positive sample quantity | Negative sample quantity | Accuracy rate |
Quick-fried squid | 1610 | 1080 | 530 | 67.1% |
Phoenix tail prawns | 1716 | 936 | 780 | 54.5% |
Scotch collops | 1568 | 697 | 871 | 44.5% |
Data set accuracy rate is as follows after automatic cleaning in the prior art:
After handling by means of the present invention, the accuracy rate of data set is as follows:
After can be seen that processing by means of the present invention based on table 1-3, the predictablity rate of data set is not relative to having
There is further promotion before optimization.
It should be understood by those skilled in the art that, embodiments herein can provide as method, system or computer program
Product.Therefore, complete hardware embodiment, complete software embodiment or reality combining software and hardware aspects can be used in the application
Apply the form of example.Moreover, it wherein includes the computer of computer usable program code that the application, which can be used in one or more,
The computer program implemented in usable storage medium (including but not limited to magnetic disk storage, CD-ROM, optical memory etc.) produces
The form of product.
The application is referring to method, the process of equipment (system) and computer program product according to the embodiment of the present application
Figure and/or block diagram describe.It should be understood that every one stream in flowchart and/or the block diagram can be realized by computer program instructions
The combination of process and/or box in journey and/or box and flowchart and/or the block diagram.It can provide these computer programs
Instruct the processor of general purpose computer, special purpose computer, Embedded Processor or other programmable data processing devices to produce
A raw machine, so that being generated by the instruction that computer or the processor of other programmable data processing devices execute for real
The device for the function of being specified in present one or more flows of the flowchart and/or one or more blocks of the block diagram.
These computer program instructions, which may also be stored in, is able to guide computer or other programmable data processing devices with spy
Determine in the computer-readable memory that mode works, so that it includes referring to that instruction stored in the computer readable memory, which generates,
Enable the manufacture of device, the command device realize in one box of one or more flows of the flowchart and/or block diagram or
The function of being specified in multiple boxes.
These computer program instructions also can be loaded onto a computer or other programmable data processing device, so that counting
Series of operation steps are executed on calculation machine or other programmable devices to generate computer implemented processing, thus in computer or
The instruction executed on other programmable devices is provided for realizing in one or more flows of the flowchart and/or block diagram one
The step of function of being specified in a box or multiple boxes.
The embodiment of the present invention is described in conjunction with attached drawing above, but the invention is not limited to above-mentioned specific
Embodiment, the above mentioned embodiment is only schematical, rather than restrictive, those skilled in the art
Under the inspiration of the present invention, without breaking away from the scope protected by the purposes and claims of the present invention, it can also make very much
Form, all of these belong to the protection of the present invention.
The above shows and describes the basic principles and main features of the present invention and the advantages of the present invention.The technology of the industry
Personnel are it should be appreciated that the present invention is not limited to the above embodiments, and the above embodiments and description only describe this
The principle of invention, without departing from the spirit and scope of the present invention, various changes and improvements may be made to the invention, these changes
Change and improvement all fall within the protetion scope of the claimed invention.The claimed scope of the invention by appended claims and its
Equivalent thereof.
Claims (10)
1. a kind of method of optimization automation image data cleaning quality, which comprises the following steps:
Pictures to be cleaned are obtained, and are input in preset two classifier of coarseness, filter out satisfactory first
Class pictures;
The first kind pictures are input to preset two classifier of fine granularity, the classification for obtaining each picture to be cleaned is pre-
The confidence level of survey;
Confidence threshold value and the first picture number threshold value corresponding with the confidence threshold value based on setting, filtering out needs
The picture manually cleaned;
Confidence level based on the class prediction for needing manually to be cleaned picture and the feedback result manually cleaned, obtain fine granularity two
The model accuracy of classifier;
It is artificial based on needing using the model accuracy of two classifier of fine granularity and model optimization frequency threshold value as optimal conditions
Confidence level, the feedback result manually cleaned and the samples pictures for cleaning the class prediction of picture carry out two classifier of fine granularity
Model optimization;
It repeats the above process, until obtaining two classifier of fine granularity met the requirements, completes the cleaning of all pictures.
2. a kind of method of optimization automation image data cleaning quality according to claim 1, it is characterised in that: described
The training process of preset two classifier of fine granularity are as follows:
One test pictures are provided, the test pictures are clustered according to cluster algorithm, obtain positive sample test
Pictures and negative sample test pictures;
Pictures are tested according to the positive sample and negative sample tests pictures, and training obtains two classifier of fine granularity.
3. a kind of method of optimization automation image data cleaning quality according to claim 1, it is characterised in that: described
Confidence threshold value and the first picture number threshold value corresponding with the confidence threshold value based on setting, filter out need into
The picture of pedestrian's work cleaning, specifically includes following sub-step:
By the confidence threshold value of setting compared with the confidence level of the class prediction of each picture to be cleaned of acquisition;
When the picture number that the confidence level of prediction is less than the confidence threshold value of the setting is greater than preset first picture number threshold
Value, then by the part, picture regards as the picture for needing manually to be cleaned.
4. a kind of method of optimization automation image data cleaning quality according to claim 3, it is characterised in that: thin
During the model optimization of two classifier of granularity, the confidence threshold value based on setting and opposite with the confidence threshold value
The the first picture number threshold value answered filters out after needing the picture step manually cleaned further include:
Rule based on setting picks out the picture for needing manually to be cleaned from each forecast confidence distributed area.
5. a kind of method of optimization automation image data cleaning quality according to claim 4, it is characterised in that: described
Confidence level based on the class prediction for needing manually to be cleaned picture and the feedback result manually cleaned, obtain fine granularity two and classify
The model accuracy of device specifically includes following sub-step:
For confidence threshold value and the first picture number threshold value corresponding with the confidence threshold value based on setting, screening
The picture for needing manually to be cleaned out, for each picture to be cleaned, when the confidence level of its class prediction and artificial clear
Feedback result conflict is washed, then determines classification error, conversely, then determining that classification is correct;
It is to be cleaned for each for picking out the picture for needing manually to be cleaned from each forecast confidence distributed area
Picture classification error is then determined, conversely, then determining when the confidence level of its class prediction conflicts with artificial cleaning feedback result
Classification is correct;
Based on above-mentioned classification judging result, the model accuracy of two classifier of fine granularity is calculated.
6. a kind of method of optimization automation image data cleaning quality according to claim 1, it is characterised in that: described
Confidence threshold value and the first picture number threshold value corresponding with the confidence threshold value based on setting, filter out need into
After the picture step of pedestrian's work cleaning further include:
In picture to be cleaned, the picture number that the confidence level of prediction is greater than the confidence threshold value of the setting is less than preset
Second picture amount threshold, then the confidence threshold value by the setting is turned down, and is then provided again higher than the confidence level after the adjusting
The image data of threshold value, is manually cleaned.
7. a kind of method of optimization automation image data cleaning quality according to claim 1, it is characterised in that: described
The model optimization of two classifier of fine granularity, specifically includes following sub-step:
Fallibility sample is obtained, includes fallibility positive sample and fallibility negative sample in the fallibility sample;
Accuracy threshold value is set when the model accuracy of two classifier of fine granularity is less than, and the optimization number of model is excellent less than model
When changing frequency threshold value, by the fallibility positive sample of acquisition, fallibility negative sample and other positive samples, negative sample together as training
Collection carrys out two classifier of fine granularity again, optimizes two classifier of fine granularity, at the same time, the model of two classifier of fine granularity
Optimization number adds one.
8. a kind of method of optimization automation image data cleaning quality according to claim 7, it is characterised in that:
The confidence level for remembering the class prediction to manually clean picture is Confidencepredict, the feedback result manually cleaned is
Confidencegroundtruth, the calculation formula of the fallibility sample are as follows:
|Confidencegroundtruth-Confidencepredict|>threshold
Wherein, ConfidencepredictValue range be (0,1), Confidencegroundtruth0 or 1, the threshold is taken to be
Preset threshold value.
9. a kind of device of optimization automation image data cleaning quality characterized by comprising
First screening module for obtaining pictures to be cleaned, and is input in preset two classifier of coarseness, screening
Satisfactory first kind pictures out;
First computing module obtains each for the first kind pictures to be input to preset two classifier of fine granularity
The confidence level of the class prediction of picture to be cleaned;
Second screening module, for based on setting confidence threshold value and the first picture corresponding with the confidence threshold value
Amount threshold filters out the picture for needing manually to be cleaned;
Second computing module, for based on the feedback for needing the manually confidence level of the class prediction of cleaning picture and manually cleaning
As a result, obtaining the model accuracy of two classifier of fine granularity;
Optimization module, for using the model accuracy of two classifier of fine granularity and model optimization frequency threshold value as optimal conditions,
Confidence level, the manually feedback result cleaned and samples pictures based on the class prediction for needing manually to be cleaned picture carry out thin
The model optimization of two classifier of granularity.
10. a kind of system of optimization automation image data cleaning quality, it is characterised in that: include:
Processor is adapted for carrying out each instruction;And
Equipment is stored, is suitable for storing a plurality of instruction, described instruction is suitable for by processor load and perform claim requires to appoint in 1~8
Step described in one.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910267802.1A CN110083728B (en) | 2019-04-03 | 2019-04-03 | Method, device and system for optimizing automatic picture data cleaning quality |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910267802.1A CN110083728B (en) | 2019-04-03 | 2019-04-03 | Method, device and system for optimizing automatic picture data cleaning quality |
Publications (2)
Publication Number | Publication Date |
---|---|
CN110083728A true CN110083728A (en) | 2019-08-02 |
CN110083728B CN110083728B (en) | 2021-08-20 |
Family
ID=67414238
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910267802.1A Active CN110083728B (en) | 2019-04-03 | 2019-04-03 | Method, device and system for optimizing automatic picture data cleaning quality |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN110083728B (en) |
Cited By (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111667003A (en) * | 2020-06-05 | 2020-09-15 | 北京百度网讯科技有限公司 | Data cleaning method, device, equipment and storage medium |
CN112418169A (en) * | 2020-12-10 | 2021-02-26 | 上海芯翌智能科技有限公司 | Method and equipment for processing human body attribute data |
CN112529851A (en) * | 2020-11-27 | 2021-03-19 | 中冶赛迪重庆信息技术有限公司 | Method, system, terminal and medium for determining state of hydraulic pipe |
CN112633320A (en) * | 2020-11-26 | 2021-04-09 | 西安电子科技大学 | Radar radiation source data cleaning method based on phase image coefficient and DBSCAN |
CN113344098A (en) * | 2021-06-22 | 2021-09-03 | 北京三快在线科技有限公司 | Model training method and device |
CN114495291A (en) * | 2022-04-01 | 2022-05-13 | 杭州魔点科技有限公司 | Method, system, electronic device and storage medium for in vivo detection |
CN118332343A (en) * | 2024-06-13 | 2024-07-12 | 健数(长春)科技有限公司 | Blood routine-based semi-supervised model optimized pulmonary tuberculosis disease classification method and system |
Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20130128981A1 (en) * | 2010-07-15 | 2013-05-23 | Fujitsu Limited | Moving image decoding apparatus, moving image decoding method and moving image encoding apparatus, and moving image encoding method |
CN107977412A (en) * | 2017-11-22 | 2018-05-01 | 上海大学 | It is a kind of based on iterative with interactive perceived age database cleaning method |
CN108664497A (en) * | 2017-03-30 | 2018-10-16 | 大有秦鼎(北京)科技有限公司 | The method and apparatus of Data Matching |
CN108874900A (en) * | 2018-05-24 | 2018-11-23 | 四川斐讯信息技术有限公司 | A kind of acquisition methods and system of samples pictures data acquisition system |
CN108875821A (en) * | 2018-06-08 | 2018-11-23 | Oppo广东移动通信有限公司 | The training method and device of disaggregated model, mobile terminal, readable storage medium storing program for executing |
CN109165665A (en) * | 2018-07-06 | 2019-01-08 | 上海康斐信息技术有限公司 | A kind of category analysis method and system |
CN109241903A (en) * | 2018-08-30 | 2019-01-18 | 平安科技(深圳)有限公司 | Sample data cleaning method, device, computer equipment and storage medium |
CN109241397A (en) * | 2018-07-06 | 2019-01-18 | 四川斐讯信息技术有限公司 | A kind of method and apparatus for cleaning data |
-
2019
- 2019-04-03 CN CN201910267802.1A patent/CN110083728B/en active Active
Patent Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20130128981A1 (en) * | 2010-07-15 | 2013-05-23 | Fujitsu Limited | Moving image decoding apparatus, moving image decoding method and moving image encoding apparatus, and moving image encoding method |
CN108664497A (en) * | 2017-03-30 | 2018-10-16 | 大有秦鼎(北京)科技有限公司 | The method and apparatus of Data Matching |
CN107977412A (en) * | 2017-11-22 | 2018-05-01 | 上海大学 | It is a kind of based on iterative with interactive perceived age database cleaning method |
CN108874900A (en) * | 2018-05-24 | 2018-11-23 | 四川斐讯信息技术有限公司 | A kind of acquisition methods and system of samples pictures data acquisition system |
CN108875821A (en) * | 2018-06-08 | 2018-11-23 | Oppo广东移动通信有限公司 | The training method and device of disaggregated model, mobile terminal, readable storage medium storing program for executing |
CN109165665A (en) * | 2018-07-06 | 2019-01-08 | 上海康斐信息技术有限公司 | A kind of category analysis method and system |
CN109241397A (en) * | 2018-07-06 | 2019-01-18 | 四川斐讯信息技术有限公司 | A kind of method and apparatus for cleaning data |
CN109241903A (en) * | 2018-08-30 | 2019-01-18 | 平安科技(深圳)有限公司 | Sample data cleaning method, device, computer equipment and storage medium |
Non-Patent Citations (1)
Title |
---|
陈锐: "基于神经网络的图像分类方法研究", 《中国优秀硕士学位论文全文数据库 信息科技辑》 * |
Cited By (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111667003A (en) * | 2020-06-05 | 2020-09-15 | 北京百度网讯科技有限公司 | Data cleaning method, device, equipment and storage medium |
CN111667003B (en) * | 2020-06-05 | 2023-11-03 | 北京百度网讯科技有限公司 | Data cleaning method, device, equipment and storage medium |
CN112633320A (en) * | 2020-11-26 | 2021-04-09 | 西安电子科技大学 | Radar radiation source data cleaning method based on phase image coefficient and DBSCAN |
CN112529851A (en) * | 2020-11-27 | 2021-03-19 | 中冶赛迪重庆信息技术有限公司 | Method, system, terminal and medium for determining state of hydraulic pipe |
CN112418169A (en) * | 2020-12-10 | 2021-02-26 | 上海芯翌智能科技有限公司 | Method and equipment for processing human body attribute data |
CN113344098A (en) * | 2021-06-22 | 2021-09-03 | 北京三快在线科技有限公司 | Model training method and device |
CN114495291A (en) * | 2022-04-01 | 2022-05-13 | 杭州魔点科技有限公司 | Method, system, electronic device and storage medium for in vivo detection |
CN118332343A (en) * | 2024-06-13 | 2024-07-12 | 健数(长春)科技有限公司 | Blood routine-based semi-supervised model optimized pulmonary tuberculosis disease classification method and system |
CN118332343B (en) * | 2024-06-13 | 2024-08-16 | 健数(长春)科技有限公司 | Blood routine-based semi-supervised model optimized pulmonary tuberculosis disease classification method and system |
Also Published As
Publication number | Publication date |
---|---|
CN110083728B (en) | 2021-08-20 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110083728A (en) | A kind of methods, devices and systems of optimization automation image data cleaning quality | |
Chen et al. | A light-weighted CNN model for wafer structural defect detection | |
Yuan-Fu | A deep learning model for identification of defect patterns in semiconductor wafer map | |
CN109948647A (en) | A kind of electrocardiogram classification method and system based on depth residual error network | |
CN108090508A (en) | A kind of classification based training method, apparatus and storage medium | |
CN109271374A (en) | A kind of database health scoring method and scoring system based on machine learning | |
CN109598307A (en) | Data screening method, apparatus, server and storage medium | |
CN106446931A (en) | Feature extraction and classification method and system based on support vector data description | |
Liu et al. | A classification method of glass defect based on multiresolution and information fusion | |
CN111145145B (en) | Image surface defect detection method based on MobileNet | |
CN110008853A (en) | Pedestrian detection network and model training method, detection method, medium, equipment | |
CN116521908B (en) | Multimedia content personalized recommendation method based on artificial intelligence | |
CN109543693A (en) | Weak labeling data noise reduction method based on regularization label propagation | |
CN106203103A (en) | The method for detecting virus of file and device | |
Lin et al. | Parameter determination and feature selection for back-propagation network by particle swarm optimization | |
CN110019563A (en) | A kind of portrait modeling method and device based on multidimensional data | |
CN110262887A (en) | CPU-FPGA method for scheduling task and device based on feature identification | |
CN110458189A (en) | Compressed sensing and depth convolutional neural networks Power Quality Disturbance Classification Method | |
Xu et al. | Comparison of shape features for the classification of wear particles | |
CN110009005A (en) | A kind of net flow assorted method based on feature strong correlation | |
CN112420125A (en) | Molecular attribute prediction method and device, intelligent equipment and terminal | |
CN116380438A (en) | Fault diagnosis method and device, electronic equipment and storage medium | |
WO2019032413A1 (en) | Inspection-guided critical site selection for critical dimension measurement | |
Lallich et al. | Improving classification by removing or relabeling mislabeled instances | |
CN114064459A (en) | Software defect prediction method based on generation countermeasure network and ensemble learning |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
TA01 | Transfer of patent application right |
Effective date of registration: 20201012 Address after: 201615 room 1001, building 21, No. 1158, Zhongxin Road, Jiuting Town, Songjiang District, Shanghai Applicant after: Shanghai re SR Information Technology Co.,Ltd. Address before: The new town of Pudong New Area Nanhui lake west two road 201306 Shanghai City No. 888 building C Applicant before: Shanghai Lianyin Electronic Technology Partnership (L.P.) |
|
TA01 | Transfer of patent application right | ||
GR01 | Patent grant | ||
GR01 | Patent grant |