CN108874900A - A kind of acquisition methods and system of samples pictures data acquisition system - Google Patents

A kind of acquisition methods and system of samples pictures data acquisition system Download PDF

Info

Publication number
CN108874900A
CN108874900A CN201810506155.0A CN201810506155A CN108874900A CN 108874900 A CN108874900 A CN 108874900A CN 201810506155 A CN201810506155 A CN 201810506155A CN 108874900 A CN108874900 A CN 108874900A
Authority
CN
China
Prior art keywords
image data
pictures
data
confidence level
positive sample
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201810506155.0A
Other languages
Chinese (zh)
Inventor
罗培元
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Sichuan Feixun Information Technology Co Ltd
Original Assignee
Sichuan Feixun Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Sichuan Feixun Information Technology Co Ltd filed Critical Sichuan Feixun Information Technology Co Ltd
Priority to CN201810506155.0A priority Critical patent/CN108874900A/en
Publication of CN108874900A publication Critical patent/CN108874900A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/211Selection of the most significant subset of features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Computational Linguistics (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Image Analysis (AREA)
  • Image Processing (AREA)

Abstract

The present invention provides the acquisition methods and system of a kind of samples pictures data acquisition system, method includes:The cleaning process of image data to be cleaned specifically includes:Obtain positive sample pictures and negative sample pictures;The characteristic information of image data in positive sample pictures and the characteristic information of Target Photo are identical;The characteristic information of image data in negative sample pictures and the characteristic information of Target Photo be not identical;According to positive sample pictures and negative sample pictures, training obtains neural network sorter;Image data to be cleaned is classified according to neural network sorter to obtain several confidence level set;The acquisition process of samples pictures data acquisition system specifically includes:It obtains confidence level and reaches the image data in the confidence level set of predetermined level, obtain samples pictures data acquisition system.The present invention realizes that automatic screening and classification obtain samples pictures data acquisition system, improves sifting sort efficiency and accuracy.

Description

A kind of acquisition methods and system of samples pictures data acquisition system
Technical field
The present invention relates to data processing field, the espespecially a kind of acquisition methods and system of samples pictures data acquisition system.
Background technique
It is well known that in the training in deep learning convolutional neural networks, it would be desirable to the data of magnanimity.And one at The data volume of ripe neural network can be readily achieved tb grades, and by taking convolutional neural networks as an example, the input source of data is generally Picture, bigger picture, one big about several million, even if the generally several hundred k of smaller picture, are counted according to tb grades of data volume It calculates, this will be very large workload.
The way of industry is at present, using crawling for web crawlers magnanimity, then all by manually being screened and being classified The problem of acquisition samples pictures data acquisition system, this processing mode is brought is that workload is extremely huge, and the selection result subjectivity is big, The selection result is easy error.Meanwhile the later period is trained neural network using the samples pictures data acquisition system of mistake, can bring The classification results of mistake.
Summary of the invention
The object of the present invention is to provide the acquisition methods and system of a kind of samples pictures data acquisition system, realize automatic screening and Classification obtains samples pictures data acquisition system, improves sifting sort efficiency and accuracy.
Technical solution provided by the invention is as follows:
The present invention provides a kind of acquisition methods of samples pictures data acquisition system, including step:
The cleaning process of image data to be cleaned specifically includes:
Obtain positive sample pictures and negative sample pictures;The characteristic information of image data in the positive sample pictures It is identical as the characteristic information of Target Photo;The characteristic information of image data in the negative sample pictures and the spy of Target Photo Reference breath is not identical;
According to the positive sample pictures and the negative sample pictures, training obtains neural network sorter;
Image data to be cleaned is classified according to the neural network sorter to obtain several confidence level set;
The acquisition process of samples pictures data acquisition system specifically includes:
It obtains confidence level and reaches the image data in the confidence level set of predetermined level, obtain samples pictures data set It closes;The characteristic information of image data in the samples pictures data acquisition system is identical as the characteristic information of Target Photo and described The image data quantity of samples pictures data acquisition system is greater than the image data quantity of the positive sample pictures.
Further, the acquisition positive sample pictures and negative sample pictures include step:
Concentrate the positive sample image data for obtaining the first preset number as the positive sample pictures from source image data; The positive sample image data is image data identical with the Target Photo characteristic information;
Concentrate the negative sample image data for obtaining the second preset number as the negative sample pictures from source image data; The negative sample image data be and the different image data of Target Photo characteristic information.
Further, described according to the positive sample pictures and the negative sample pictures, training obtains neural network Sorter includes step:
Delete the last one full articulamentum of the neural network model of pre-training;
A full articulamentum and an active coating are successively added at the last one full articulamentum deleting;
It is obtained according to the positive sample pictures and the newly added full articulamentum of negative sample pictures training and active coating To the neural network sorter.
Further, described that image data to be cleaned is classified to obtain several according to the neural network sorter Confidence level set includes step:
All image datas to be cleaned are inputted into the neural network sorter, obtain setting for each image data to be cleaned Reliability;
Range is divided according to the confidence level of the image data to be cleaned and default confidence interval, it will be described to be cleaned Image data is referred to corresponding confidence level set.
Further, described according to the confidence level according to the image data to be cleaned and default confidence interval Range is divided, it includes step that the image data to be cleaned, which is referred to after corresponding confidence level set,:
It obtains confidence level and reaches the image data in the confidence level set of predetermined level as positive sample image data It is added to the positive sample pictures.
Further, the acquisition confidence level reaches the image data in the confidence level set of predetermined level, obtains It include step after samples pictures data acquisition system:
The number of pictures of image data of the statistical confidence grade in the confidence level set of predetermined level range;
When the number of pictures reaches target requirement quantity, stop the cleaning process of image data to be cleaned;
When the number of pictures miss the mark quantity required, continue the cleaning process of image data to be cleaned.
The present invention also provides a kind of acquisition systems of samples pictures data acquisition system, including:Cleaning module and acquisition module;Institute Cleaning module is stated to connect with the acquisition module;
The cleaning module obtains positive sample pictures and negative sample pictures;Picture in the positive sample pictures The characteristic information of data and the characteristic information of Target Photo are identical;The characteristic information of image data in the negative sample pictures It is not identical as the characteristic information of Target Photo;According to the positive sample pictures and the negative sample pictures, training obtains mind Through network distribution device;Image data to be cleaned is classified according to the neural network sorter to obtain several confidence level collection It closes;
The acquisition module obtains confidence level and reaches the image data in the confidence level set of predetermined level, obtains Samples pictures data acquisition system;The characteristic information of image data in the samples pictures data acquisition system and the feature of Target Photo are believed Manner of breathing is same, and the image data quantity of the samples pictures data acquisition system is greater than the image data number of the positive sample pictures Amount.
Further, the cleaning module includes:Positive sample data capture unit, negative sample data capture unit, sorting Device training unit and classified storage unit;The sorter training unit is described respectively with the positive sample data capture unit Negative sample data capture unit is connected with the classified storage unit;
The positive sample data capture unit concentrates the positive sample picture number for obtaining the first preset number from source image data According to as the positive sample pictures;The positive sample image data is picture number identical with the Target Photo characteristic information According to;
The negative sample data capture unit concentrates the negative sample picture number for obtaining the second preset number from source image data According to as the negative sample pictures;The negative sample image data be and the different picture of Target Photo characteristic information Data;
The sorter training unit deletes the last one full articulamentum of the neural network model of pre-training;It is deleting A full articulamentum and an active coating are successively added at the last one full articulamentum;According to positive sample pictures and described The newly added full articulamentum of negative sample pictures training and active coating obtain the neural network sorter;
All image datas to be cleaned are inputted the neural network sorter, obtained each by the classified storage unit The confidence level of image data to be cleaned;Model is divided according to the confidence level of the image data to be cleaned and default confidence interval It encloses, the image data to be cleaned is referred to corresponding confidence level set.
Further, the positive sample data capture unit is connect with the classified storage unit;
The positive sample data capture unit, also acquisition confidence level reach the figure in the confidence level set of predetermined level Sheet data is added to the positive sample pictures as positive sample image data.
Further, further include:Statistical module and control module;The statistical module respectively with the acquisition module and institute Control module connection is stated, the control module is connect with the cleaning module;
The statistical module, the figure of image data of the statistical confidence grade in the confidence level set of predetermined level range Piece number;
When the number of pictures reaches target requirement quantity, the control module controls the cleaning module and stops to clear Wash the cleaning process of image data;
When the number of pictures miss the mark quantity required, the control module control the cleaning module continue to Clean the cleaning process of image data.
The acquisition methods and system of a kind of samples pictures data acquisition system provided through the invention, can bring it is following at least A kind of beneficial effect:
1) present invention classifies to picture to be cleaned by neural network sorter, reduces the labour of artificial screening classification Power and subjectivity promote sifting sort efficiency, thus when the quantity of the corresponding image data of part picture characteristic information is inadequate, Efficiently, the screening of high quality expands sample of the corresponding image data as neural network model.
2) present invention obtains samples pictures data acquisition system and provides a kind of image data to be cleaned in the process cleans automatically Auxiliary data cleaning strategy for improving artificial cleaning efficiency, or even full automation cleaning, and solves artificial cleaning bring Variety of problems reduces the subjectivity of artificial cleaning sifting sort, promotes the accuracy and efficiency of sifting sort.
3) present invention is sorted while sorting image data to be cleaned using neural network sorter using neural network The image data for the high confidence level that device sifting sort obtains is added in positive sample, is trained to neural network sorter, can To effectively improve model accuracy, accelerate the speed of neural network sorter training.
Detailed description of the invention
Below by clearly understandable mode, preferred embodiment is described with reference to the drawings, to a kind of samples pictures data set The acquisition methods of conjunction and above-mentioned characteristic, technical characteristic, advantage and its implementation of system are further described.
Fig. 1 is a kind of flow chart of one embodiment of the acquisition methods of samples pictures data acquisition system of the present invention;
Fig. 2 is a kind of flow chart of an example of the acquisition methods of samples pictures data acquisition system of the present invention;
Fig. 3 is a kind of structural schematic diagram of one embodiment of the acquisition system of samples pictures data acquisition system of the present invention;
Fig. 4 is a kind of structural schematic diagram of one embodiment of the acquisition system of samples pictures data acquisition system of the present invention.
Specific embodiment
In order to more clearly explain the embodiment of the invention or the technical proposal in the existing technology, Detailed description of the invention will be compareed below A specific embodiment of the invention.It should be evident that drawings in the following description are only some embodiments of the invention, for For those of ordinary skill in the art, without creative efforts, it can also be obtained according to these attached drawings other Attached drawing, and obtain other embodiments.
To make simplified form, part related to the present invention is only schematically shown in each figure, they are not represented Its practical structures as product.In addition, there is identical structure or function in some figures so that simplified form is easy to understand Component only symbolically depicts one of those, or has only marked one of those.Herein, "one" is not only indicated " only this ", can also indicate the situation of " more than one ".
Transfer learning (Transfer learning), as the term suggests being exactly is exactly to move trained model parameter has been learned New model is moved on to help new model training.In view of there are correlations for most of data or task, so by moving Move study we can by the model parameter acquired (also be understood as model acquire knowledge) by certain mode come point It enjoys and not having to as most of networks to accelerate the learning efficiency of simultaneously Optimized model from zero study to new model.
Data cleansing just refers to:The data of needs are picked out in the data set of magnanimity.Such as classification problem, it lifts A example:Screen magnanimity tomato omelette/omelet picture, cleaning task be exactly by these pictures be tomato omelette/omelet picture screening Out.
First embodiment of the invention, as shown in Figure 1:
A kind of acquisition methods of samples pictures data acquisition system, including:
The cleaning process of image data to be cleaned specifically includes:
Obtain positive sample pictures and negative sample pictures;The characteristic information of image data in the positive sample pictures It is identical as the characteristic information of Target Photo;The characteristic information of image data in the negative sample pictures and the spy of Target Photo Reference breath is not identical;
Specifically, characteristic information includes but is not limited to image content, the picture features such as picture classification.It can be to from network On the picture that crawls by artificial screening preset number image data identical with the characteristic information of Target Photo as positive sample The different image data negative sample pictures of the characteristic information of pictures, artificial screening preset number and Target Photo;It can also To open image data identical with the characteristic information of Target Photo by artificial screening one to the picture crawled from network, and with The identical image data of the characteristic information with Target Photo is that template picture carries out data augmentation, such as pixel transform data augmentation It carries out data augmentation with any one or a variety of pairs of template pictures in geometric transformation data augmentation and obtains positive sample pictures. Similarly, negative sample pictures can also be obtained according to the mode of data augmentation.Any acquisition positive sample pictures and negative sample This pictures is not limited thereto, and is all belonged to the scope of protection of the present invention.
Wherein, pixel transform includes:1, increase noise and filtering, the mode of noise include but is not limited to salt-pepper noise, Gaussian noise, median filtering;2, channel is converted, the sequence in tri- channels RBG is adjusted;3, contrast, brightness and saturation degree are adjusted, Color jitter.
Geometric transformation includes:1, it overturns, such as:Flip horizontal, it is vertical to overturn, it overturns according to the actual situation, for example, close In face, the face reformed into down has been spun upside down, has been overturn without practical significance;2, it translates, simulates real-life Picture situation not placed in the middle, occurs the transformation of position;3, it rotates;4, it sets black, simulates the data sample being at least partially obscured;5, it cuts out It cuts;6, it scales.
According to the positive sample pictures and the negative sample pictures, training obtains neural network sorter;
Specifically, the image data in positive sample image data collection and negative sample pictures can be inputted to pre-training mind Through being finely adjusted to the parameter of all levels in pre-training neural network model in network model, then carry out to all layers Grade is trained to obtain neural network sorter;It can also be by the picture number in positive sample image data collection and negative sample pictures According to inputting in pre-training neural network model, to the parameter of the full articulamentum of the last layer in pre-training neural network model into Row fine tuning, then carries out being trained to obtain neural network sorter to the full articulamentum of the last layer.
Image data to be cleaned is classified according to the neural network sorter to obtain several confidence level set;
Specifically, the neural network sorter obtained by training carries out screening point to the image data to be cleaned of acquisition Class, such as 100,000 image datas to be cleaned are crawled from network, this 100,000 image datas to be cleaned are input to nerve net In network sorter, corresponding figure to be cleaned is calculated by carrying out in neural network sorter to each image data to be cleaned The predicted value of sheet data identifies the classification of each picture to be cleaned according to the corresponding predicted value of each image data to be cleaned, from And classify to all image datas to be cleaned.
The acquisition process of samples pictures data acquisition system specifically includes:
It obtains confidence level and reaches the image data in the confidence level set of predetermined level, obtain samples pictures data set It closes;The characteristic information of image data in the samples pictures data acquisition system is identical as the characteristic information of Target Photo and described The image data quantity of samples pictures data acquisition system is greater than the image data quantity of the positive sample pictures.
Specifically, in the present embodiment, the image data that confidence level reaches in the confidence level set of predetermined level is exactly Similarity reaches the picture of default similarity threshold, these pictures between Target Photo corresponding with the characteristic information of Target Photo Data are exactly the picture of user demand.The present invention classifies to picture to be cleaned by neural network sorter, reduces artificial The labour of sifting sort and subjectivity promote sifting sort efficiency, thus in the corresponding picture number of part picture characteristic information According to quantity it is inadequate when, efficiently, the screening of high quality expands the sample of corresponding image data as neural network model.
Second embodiment of the invention, the present embodiment are the preferred embodiments of first embodiment, with above-mentioned first embodiment phase Than advanced optimizing, obtaining positive sample pictures and negative sample pictures include:
Concentrate the positive sample image data for obtaining the first preset number as the positive sample pictures from source image data; The positive sample image data is image data identical with the Target Photo characteristic information;
Concentrate the negative sample image data for obtaining the second preset number as the negative sample pictures from source image data; The negative sample image data be and the different image data of Target Photo characteristic information.
Specifically, in many occasions, it is not necessary that train entire neural network model (random initializtion ginseng from the beginning Number) because without data set abundant enough, and training is also very time-consuming, cost source process.Therefore, direct basis The neural network model of the pre-training of trained mistake has been classified before this source image data, which is concentrated, obtains certain amount such as the Then the positive sample image data of one preset quantity is concentrated from source image data as positive sample pictures and obtains certain amount such as The positive sample image data of second preset quantity is as negative sample pictures.First preset quantity can phase with the second preset quantity Deng can also be unequal.Such as the pre- instruction being obtained corresponding to vegetable picture is trained excessively to various vegetable pictures before this Experienced neural network model, in order to obtain the sub-neural network model of tomato omelette/omelet vegetable, then just directly according to pre-training The source image data that the classifier of neural network model is classified concentrates the positive sample image data for filtering out tomato omelette/omelet, directly Positive sample pictures after having reached the first preset quantity to quantity as the sub-neural network model of tomato omelette/omelet vegetable, and It is not tomato omelette/omelet that the source image data concentration classified according to the classifier of the neural network model of pre-training, which filters out, Negative sample image data, the sub-neural network model after quantity has reached the second preset quantity as tomato omelette/omelet vegetable Negative sample pictures.
Third embodiment of the invention, the present embodiment are the preferred embodiments of first embodiment, with above-mentioned first embodiment phase Than advanced optimizing, according to the positive sample pictures and the negative sample pictures, training obtains neural network sorter Including step:
Delete the last one full articulamentum of the neural network model of pre-training;
A full articulamentum and an active coating are successively added at the last one full articulamentum deleting;
It is obtained according to the positive sample pictures and the newly added full articulamentum of negative sample pictures training and active coating To the neural network sorter.
Specifically, the operation of transfer learning training sorter, the actually neural network model of deletion pre-training are such as (mobilenetv1) after the full articulamentum of the last one, then a full connection is successively added at the last one full articulamentum Layer and an active coating.It is obtained according to positive sample pictures and the newly added full articulamentum of negative sample pictures training and active coating Neural network sorter, neural network sorter are to the already present nerve according to above-mentioned deletion addition treated pre-training Network model is trained fine tuning to neural network dimension and neural network model parameter, is correctly tied so that network is fitted Fruit.Assuming that there is 400 vegetable image datas to need to clean, newly-increased full articulamentum is 1 × 1001 × 401 filter, is increased newly A full articulamentum stochastic parameter initialization (such as gaussian random initialization), and after training, reverse conduction obtains fixed ginseng Number.SoftMax can be used in a newly-increased active coating, guarantees that neural network during forward and reverse conducts, is conducive to Calculate the regression fit with mathematics.This transfer learning model for being added to full an articulamentum and an active coating, because not It is related to the modification to entire model parameter, only it needs to be determined that two layers of newly-increased this during forward-propagating and backpropagation Parameter, and the parameter of other layers can be multiplexed, therefore can greatly be saved the training time, be saved in the case where guaranteeing precision Time of deep learning.
Fourth embodiment of the invention, the present embodiment are the preferred embodiments of first embodiment, with above-mentioned first embodiment phase Than advanced optimizing, being classified to obtain several confidences to image data to be cleaned according to the neural network sorter Degree set includes step:
All image datas to be cleaned are inputted into the neural network sorter, obtain setting for each image data to be cleaned Reliability;
Range is divided according to the confidence level of the image data to be cleaned and default confidence interval, it will be described to be cleaned Image data is referred to corresponding confidence level set.
Specifically, all image datas to be cleaned are just inputted neural network after training obtains neural network sorter Sorter is predicted to obtain corresponding confidence level, each be set by neural network sorter to each image data to be cleaned What reliability represented is the image data to be cleaned as the probability of Target Photo characteristic information, and confidence level is bigger, this figure to be cleaned A possibility that sheet data is consistent with the characteristic information of Target Photo is bigger, i.e., image data to be cleaned is more similar to Target Photo. The present invention by real time to acquisition obtain image data to be cleaned carry out calculating confidence level, so as to according to confidence level will each to Cleaning image data classification is divided in corresponding confidence level set, convenient for the subsequent image data in each confidence level set Screened the samples pictures data acquisition system needed.
Fifth embodiment of the invention, the present embodiment are the preferred embodiments of fourth embodiment, with above-mentioned fourth embodiment phase Than advanced optimizing, being divided according to the confidence level according to the image data to be cleaned and default confidence interval Range, it includes step that the image data to be cleaned, which is referred to after corresponding confidence level set,:
It obtains confidence level and reaches the image data in the confidence level set of predetermined level as positive sample image data It is added to the positive sample pictures.
Specifically, the sample size of the positive sample picture of model is on the low side, in fact, instructing when transfer learning constructs model Model is in the state for being similar to poor fitting after having practiced, and at this moment, the confidence level sorted out reaches predetermined level Image data in confidence level set is added to positive sample pictures as positive sample image data, and confidence level is reached pre- Neural network sorter is carried out again if the image data in the confidence level set of grade is added to again in positive sample pictures Secondary training.In this way after first time artificial screening obtains positive sample pictures, during continuous training neural network sorter, The image data in confidence level set for directly reaching predetermined level using confidence level changes as positive sample image data Generation training, this iterative training can effectively improve model progress.It can be improved the precision of neural network sorter While, additionally it is possible to the artificial screening for reducing positive sample pictures and negative sample pictures is greatly lowered labor workload, subtracts Data classification caused by the subjectivity of few artificial screening identification malfunctions, and improves the robustness of neural network model.
Sixth embodiment of the invention, the present embodiment are the preferred embodiments of the first to the 5th embodiment, with above-mentioned first to 5th embodiment is compared, and is advanced optimized, and the acquisition confidence level reaches the figure in the confidence level set of predetermined level Sheet data, obtaining samples pictures data acquisition system includes later step:
The number of pictures of image data of the statistical confidence grade in the confidence level set of predetermined level range;
When the number of pictures reaches target requirement quantity, stop the cleaning process of image data to be cleaned;
When the number of pictures miss the mark quantity required, continue the cleaning process of image data to be cleaned.
Specifically, it is preferred that after the confidence level that each image data to be cleaned can be calculated, automated by data acquisition Script, data acquisition automatized script can classify to image data to be cleaned according to confidence interval division range is preset, Different confidence level set are respectively put into, image data of the statistical confidence grade in the confidence level set of predetermined level range Number of pictures stops the cleaning process of image data to be cleaned when number of pictures reaches target requirement quantity;Work as number of pictures When miss the mark quantity required, continue the cleaning process of image data to be cleaned.Such as high confidence level set (such as 0.95 Or more confidence level) in, be basically required picture, cleaning at this moment can be fairly simple, it is only necessary to figure Piece collection is swept and mistake, removes the picture of apparent error, interior for low confidence set (such as 0.3 and confidence level below), The data set of the inside substantially the cleaning of unwanted picture at this moment can be fairly simple, it is only necessary to pictures are swept And mistake, select obvious correctly picture.Screening sorts out neural network sorter to high confidence level collection through the above way The image data of identification classification error is closed, remaining figure in the high confidence level set after then counting the image data of deletion error The number of pictures of sheet data stops the cleaning process of image data to be cleaned if number of pictures reaches target requirement quantity, Conversely, continuing the cleaning process of image data to be cleaned.Similarly, aforesaid way can also be taken to carry out low confidence set Screening.The cleaning efficiency of image data to be cleaned can be thus promoted, the workload of the screening of artificial eye subjectivity is reduced, mentions High working efficiency.
Further, it is also possible to range is divided according to the confidence level of the image data to be cleaned and default confidence interval, After the image data to be cleaned is referred to corresponding confidence level set, thinking that Internet picture data set is boundless Under assuming that, image data of the confidence level in the confidence level set outside predetermined level range can be directly abandoned, network is climbed Worm carries out constantly crawling image data, constantly accumulates the picture of these high confidence levels, the figure in high confidence level set Until the number of pictures of sheet data has reached target requirement quantity.Such as when needing the image data in high confidence level set, The image data of confidence level set and low confidence set in can directly abandoning only retains the picture number in high confidence level set According to, and the crawler of front is constantly crawled, and the picture of these high confidence levels is constantly accumulated, until picture number reaches mesh Until marking quantity required.Simultaneously using neural network sorter sorting data, it is obtained using neural network sorter category filter Confidence level set in image data precision raising has been carried out to neural network sorter, strengthen neural network sorter Precision.
Based on the above embodiment, example cites an actual example, as shown in Fig. 2, including:
S1, a small amount of positive sample picture is obtained;
S2, transfer learning training neural network sorter;
S3, data acquisition automatized script sort image data to be cleaned, obtain the image data of three confidence levels; Enter step S4 or S5;
S4, artificial screening high confidence level image data;
S5, judge whether image data is high confidence level;If so, entering step S6 and S7;Otherwise, S8 is entered step;
S6, discarding;
S7, high confidence level image data is stored to target data set;
S8, using high confidence level image data as positive sample picture, and enter step S2;
S9, judge that target data concentrates whether the number of pictures of high confidence level image data reaches target requirement quantity;If It is to enter step S10;Otherwise, return step S3;
S10, samples pictures data acquisition system is obtained.
Specifically, in the present embodiment, it is assumed that after the processing of neural network sorter, image data to be cleaned is divided into Three classes.The first kind, the data set of high confidence level, the second class, the data set of middle confidence level, the picture in moderate confidence level data set Data, that is, the picture for needing manually to check one by one as conventional cleaning task, it is however generally that, the data set of middle confidence level The image data quantity of conjunction is generally not too large.The data set of third class low confidence.The picture taken off that swashes from network is past It is past to have very more interference pictures, even completely unrelated picture.The data set to be cleaned for obtaining magnanimity, filters out A small amount of positive sample picture and negative sample picture, it is however generally that, tens are just much of that, for the figure of tens of thousands of hundreds of thousands easily Sheet data amount, this is extremely micro work.Transfer learning is carried out using positive sample picture and negative sample picture, training obtains one The neural network model of pre-training is deleted the last layer, and successively that is, in a manner of transfer learning by a neural network sorter A full articulamentum and an active coating is newly added, using positive sample picture and negative sample picture to the full articulamentum being newly added And active coating is trained, and has the training time short, and can achieve relatively high accuracy.The nerve obtained using training Network distribution device predicts image data to be cleaned to obtain corresponding confidence level.This confidence level represents whether picture is me Required for targeted species probability.
It, can be by data acquisition automatized script to be cleaned after obtaining the confidence level of each image data to be cleaned Image data carry out classified storage, data acquisition automatized script can according to low middle high three sections to image data to be cleaned into Row classified storage is respectively put into three different confidence level set.Substantially for the image data in high confidence level set It is exactly required picture.At this moment cleaning can be fairly simple, it is only necessary to carry out to the image data in high confidence level set Sweep and mistake, remove be clearly not Target Photo image data.It is basic for the image data in low confidence set Upper is exactly unwanted picture, and cleaning at this moment can be fairly simple, it is only necessary to the image data in low confidence set Swept and mistake, remove be obviously Target Photo image data.Both modes all greatly improve work effect Rate.
After obtaining the confidence level of each image data to be cleaned, can also by neural network sorter calculate each to It cleans the confidence level of image data, and is divided after range classified according to confidence level and default confidence interval, thought Under the boundless hypothesis of Internet picture data set, the image data of middle confidence level and low confidence can be directly discarded, Only retain the image data of high confidence level, and constantly crawled image data to be cleaned by web crawlers, constantly tires out The image data of product high confidence level, until the number of pictures of the image data of high confidence level reaches target requirement quantity.? While sorting image data to be cleaned using neural network sorter, the height that is obtained using neural network sorter sifting sort The image data of confidence level is added in positive sample, is trained to neural network sorter, and model essence can be effectively improved Degree.
The present invention picture quality is very good and picture quality is excessively poor when can play the role of it is extraordinary, very Extremely to the perfect condition that can not need manually to clean substantially.But, it is however generally that, for conservative, it is desired nonetheless to manually most Inspection afterwards, the even so also very big efficiency for improving data cleansing task and the subjectivity for avoiding data cleansing With tired bring accuracy.The present invention is to provide a kind of picture number to be cleaned during obtaining samples pictures data acquisition system According to the auxiliary data cleaning strategy cleaned automatically, for improving artificial cleaning efficiency, or even full automation cleaning, and people is solved Work cleans bring variety of problems, reduces the subjectivity of artificial cleaning sifting sort, promotes the accuracy and efficiency of sifting sort.
Seventh embodiment of the invention, as shown in figure 3, a kind of acquisition system of samples pictures data acquisition system includes:
Data cleansing module 100 and sample acquisition module 200;The data cleansing module 100 and the sample
This acquisition module 200 connects;
The data cleansing module 100 obtains positive sample pictures and negative sample pictures;In the positive sample pictures Image data characteristic information it is identical as the characteristic information of Target Photo;The spy of image data in the negative sample pictures Reference breath is not identical as the characteristic information of Target Photo;According to the positive sample pictures and the negative sample pictures, training Obtain neural network sorter;Image data to be cleaned is classified to obtain several according to the neural network sorter and is set Reliability set;
The sample acquisition module 200 obtains confidence level and reaches the picture number in the confidence level set of predetermined level According to obtaining samples pictures data acquisition system;The characteristic information and Target Photo of image data in the samples pictures data acquisition system Characteristic information it is identical, and the image data quantity of the samples pictures data acquisition system be greater than the positive sample pictures picture Data bulk.
Specifically, the present embodiment is the corresponding system embodiment of above method embodiment, specific effect is referring to above-mentioned first Embodiment, this is no longer going to repeat them.
Eighth embodiment of the invention, as shown in figure 4, the present embodiment is the preferred embodiment of the 7th embodiment, with above-mentioned Seven embodiments are compared, and the data cleansing module 100 includes:Positive sample data capture unit, negative sample data capture unit, point Pick device training unit and classified storage unit;The sorter training unit respectively with the positive sample data capture unit, institute Negative sample data capture unit is stated to connect with the classified storage unit;
The positive sample data capture unit concentrates the positive sample picture number for obtaining the first preset number from source image data According to as the positive sample pictures;The positive sample image data is picture number identical with the Target Photo characteristic information According to;
The negative sample data capture unit concentrates the negative sample picture number for obtaining the second preset number from source image data According to as the negative sample pictures;The negative sample image data be and the different picture of Target Photo characteristic information Data;
The sorter training unit deletes the last one full articulamentum of the neural network model of pre-training;It is deleting A full articulamentum and an active coating are successively added at the last one full articulamentum;According to positive sample pictures and described The newly added full articulamentum of negative sample pictures training and active coating obtain the neural network sorter;
All image datas to be cleaned are inputted the neural network sorter, obtained each by the classified storage unit The confidence level of image data to be cleaned;Model is divided according to the confidence level of the image data to be cleaned and default confidence interval It encloses, the image data to be cleaned is referred to corresponding confidence level set.
Specifically, the present embodiment is the corresponding system embodiment of above method embodiment, specific effect is referring to above-mentioned second Embodiment, 3rd embodiment and fourth embodiment, this is no longer going to repeat them.
Ninth embodiment of the invention, the present embodiment are the preferred embodiments of the 8th embodiment, with above-mentioned 8th embodiment phase Than the positive sample data capture unit is connect with the classified storage unit;
The positive sample data capture unit, also acquisition confidence level reach the figure in the confidence level set of predetermined level Sheet data is added to the positive sample pictures as positive sample image data.
Specifically, the present embodiment is the corresponding system embodiment of above method embodiment, specific effect is referring to the above-mentioned 5th Embodiment, this is no longer going to repeat them.
Tenth embodiment of the invention, the present embodiment are the preferred embodiments of the 7th to the 8th embodiment, with the above-mentioned 7th to 8th embodiment is compared, and further includes:Statistical module and control module;The statistical module respectively with the sample acquisition module 200 connect with the control module, and the control module is connect with the data cleansing module 100;
The statistical module, the figure of image data of the statistical confidence grade in the confidence level set of predetermined level range Piece number;
When the number of pictures reaches target requirement quantity, the control module controls the data cleansing module 100 Stop the cleaning process of image data to be cleaned;
When the number of pictures miss the mark quantity required, the control module controls the data cleansing module 100 continue the cleaning process of image data to be cleaned.
Specifically, the present embodiment is the corresponding system embodiment of above method embodiment, specific effect is referring to the above-mentioned 6th Embodiment, this is no longer going to repeat them.
It should be noted that above-described embodiment can be freely combined as needed.The above is only of the invention preferred Embodiment, it is noted that for those skilled in the art, in the premise for not departing from the principle of the invention Under, several improvements and modifications can also be made, these modifications and embellishments should also be considered as the scope of protection of the present invention.

Claims (10)

1. a kind of acquisition methods of samples pictures data acquisition system, which is characterized in that including step:
The cleaning process of image data to be cleaned specifically includes:
Obtain positive sample pictures and negative sample pictures;The characteristic information and mesh of image data in the positive sample pictures Mark on a map piece characteristic information it is identical;The characteristic information of image data in the negative sample pictures and the feature of Target Photo are believed It ceases not identical;
According to the positive sample pictures and the negative sample pictures, training obtains neural network sorter;
Image data to be cleaned is classified according to the neural network sorter to obtain several confidence level set;
The acquisition process of samples pictures data acquisition system specifically includes:
It obtains confidence level and reaches the image data in the confidence level set of predetermined level, obtain samples pictures data acquisition system; The characteristic information of image data in the samples pictures data acquisition system is identical as the characteristic information of Target Photo, and the sample The image data quantity of image data set is greater than the image data quantity of the positive sample pictures.
2. the acquisition methods of samples pictures data acquisition system according to claim 1, which is characterized in that the acquisition positive sample Pictures and negative sample pictures include step:
Concentrate the positive sample image data for obtaining the first preset number as the positive sample pictures from source image data;It is described Positive sample image data is image data identical with the Target Photo characteristic information;
Concentrate the negative sample image data for obtaining the second preset number as the negative sample pictures from source image data;It is described Negative sample image data be and the different image data of Target Photo characteristic information.
3. the acquisition methods of samples pictures data acquisition system according to claim 1, which is characterized in that it is described according to it is described just Samples pictures collection and the negative sample pictures, it includes step that training, which obtains neural network sorter,:
Delete the last one full articulamentum of the neural network model of pre-training;
A full articulamentum and an active coating are successively added at the last one full articulamentum deleting;
Institute is obtained according to the positive sample pictures and the newly added full articulamentum of negative sample pictures training and active coating State neural network sorter.
4. the acquisition methods of samples pictures data acquisition system according to claim 1, which is characterized in that described according to the mind Image data to be cleaned is classified to obtain several confidence level set to include step through network distribution device:
All image datas to be cleaned are inputted into the neural network sorter, obtain the confidence of each image data to be cleaned Degree;
Range is divided according to the confidence level of the image data to be cleaned and default confidence interval, by the picture to be cleaned Data are referred to corresponding confidence level set.
5. the acquisition methods of samples pictures data acquisition system according to claim 4, which is characterized in that described according to described Range is divided according to the confidence level of the image data to be cleaned and default confidence interval, the image data to be cleaned is returned It include step after class to corresponding confidence level set:
It obtains confidence level and reaches the image data in the confidence level set of predetermined level as the addition of positive sample image data To the positive sample pictures.
6. the acquisition methods of samples pictures data acquisition system according to claim 1-5, which is characterized in that described to obtain Confidence level is taken to reach the image data in the confidence level set of predetermined level, obtain samples pictures data acquisition system includes later Step:
The number of pictures of image data of the statistical confidence grade in the confidence level set of predetermined level range;
When the number of pictures reaches target requirement quantity, stop the cleaning process of image data to be cleaned;
When the number of pictures miss the mark quantity required, continue the cleaning process of image data to be cleaned.
7. a kind of acquisition system of samples pictures data acquisition system, which is characterized in that including:Cleaning module and acquisition module;It is described Cleaning module is connect with the acquisition module;
The cleaning module obtains positive sample pictures and negative sample pictures;Image data in the positive sample pictures Characteristic information it is identical as the characteristic information of Target Photo;The characteristic information and mesh of image data in the negative sample pictures Mark on a map piece characteristic information it is not identical;According to the positive sample pictures and the negative sample pictures, training obtains nerve net Network sorter;Image data to be cleaned is classified according to the neural network sorter to obtain several confidence level set;
The acquisition module obtains confidence level and reaches the image data in the confidence level set of predetermined level, obtains sample Image data set;The characteristic information phase of the characteristic information of image data in the samples pictures data acquisition system and Target Photo Together, and the image data quantity of the samples pictures data acquisition system be greater than the positive sample pictures image data quantity.
8. the acquisition system of samples pictures data acquisition system according to claim 7, which is characterized in that the cleaning module packet It includes:Positive sample data capture unit, negative sample data capture unit, sorter training unit and classified storage unit;Described point Pick device training unit respectively with the positive sample data capture unit, the negative sample data capture unit and the classified storage Unit connection;
The positive sample data capture unit concentrates the positive sample image data for obtaining the first preset number to make from source image data For the positive sample pictures;The positive sample image data is image data identical with the Target Photo characteristic information;
The negative sample data capture unit concentrates the negative sample image data for obtaining the second preset number to make from source image data For the negative sample pictures;The negative sample image data be and the different picture number of the Target Photo characteristic information According to;
The sorter training unit deletes the last one full articulamentum of the neural network model of pre-training;It is last deleting A full articulamentum and an active coating are successively added at one full articulamentum;According to the positive sample pictures and the negative sample The newly added full articulamentum of this pictures training and active coating obtain the neural network sorter;
All image datas to be cleaned are inputted the neural network sorter, obtained each to clear by the classified storage unit Wash the confidence level of image data;Range is divided according to the confidence level of the image data to be cleaned and default confidence interval, The image data to be cleaned is referred to corresponding confidence level set.
9. the acquisition system of samples pictures data acquisition system according to claim 8, which is characterized in that the positive sample data Acquiring unit is connect with the classified storage unit;
The positive sample data capture unit, also acquisition confidence level reach the picture number in the confidence level set of predetermined level The positive sample pictures are added to according to as positive sample image data.
10. according to the acquisition system of the described in any item samples pictures data acquisition systems of claim 7-9, which is characterized in that also wrap It includes:Statistical module and control module;The statistical module is connect with the acquisition module and the control module respectively, the control Molding block is connect with the cleaning module;
The statistical module, the picture number of image data of the statistical confidence grade in the confidence level set of predetermined level range Mesh;
When the number of pictures reaches target requirement quantity, the control module controls the cleaning module and stops figure to be cleaned The cleaning process of sheet data;
When the number of pictures miss the mark quantity required, it is to be cleaned that the control module controls the cleaning module continuation The cleaning process of image data.
CN201810506155.0A 2018-05-24 2018-05-24 A kind of acquisition methods and system of samples pictures data acquisition system Pending CN108874900A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810506155.0A CN108874900A (en) 2018-05-24 2018-05-24 A kind of acquisition methods and system of samples pictures data acquisition system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810506155.0A CN108874900A (en) 2018-05-24 2018-05-24 A kind of acquisition methods and system of samples pictures data acquisition system

Publications (1)

Publication Number Publication Date
CN108874900A true CN108874900A (en) 2018-11-23

Family

ID=64334141

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810506155.0A Pending CN108874900A (en) 2018-05-24 2018-05-24 A kind of acquisition methods and system of samples pictures data acquisition system

Country Status (1)

Country Link
CN (1) CN108874900A (en)

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110083728A (en) * 2019-04-03 2019-08-02 上海联隐电子科技合伙企业(有限合伙) A kind of methods, devices and systems of optimization automation image data cleaning quality
CN110210536A (en) * 2019-05-22 2019-09-06 北京邮电大学 A kind of the physical damnification diagnostic method and device of optical interconnection system
CN111160406A (en) * 2019-12-10 2020-05-15 北京达佳互联信息技术有限公司 Training method of image classification model, and image classification method and device
CN111652259A (en) * 2019-04-16 2020-09-11 上海铼锶信息技术有限公司 Method and system for cleaning data
CN111652257A (en) * 2019-03-27 2020-09-11 上海铼锶信息技术有限公司 Sample data cleaning method and system
CN111651433A (en) * 2019-03-27 2020-09-11 上海铼锶信息技术有限公司 Sample data cleaning method and system
CN112287923A (en) * 2020-12-24 2021-01-29 德联易控科技(北京)有限公司 Card information identification method, device, equipment and storage medium

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101814149A (en) * 2010-05-10 2010-08-25 华中科技大学 Self-adaptive cascade classifier training method based on online learning
US20140355871A1 (en) * 2012-06-15 2014-12-04 Vufind, Inc. System and method for structuring a large scale object recognition engine to maximize recognition accuracy and emulate human visual cortex
CN104850832A (en) * 2015-05-06 2015-08-19 中国科学院信息工程研究所 Hierarchical iteration-based large-scale image sample marking method and system
CN105224947A (en) * 2014-06-06 2016-01-06 株式会社理光 Sorter training method and system
CN107705256A (en) * 2017-09-13 2018-02-16 西南交通大学 A kind of forward direction Vehicular video image enchancing method semantic based on contact net
CN108062341A (en) * 2016-11-08 2018-05-22 中国移动通信有限公司研究院 The automatic marking method and device of data

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101814149A (en) * 2010-05-10 2010-08-25 华中科技大学 Self-adaptive cascade classifier training method based on online learning
US20140355871A1 (en) * 2012-06-15 2014-12-04 Vufind, Inc. System and method for structuring a large scale object recognition engine to maximize recognition accuracy and emulate human visual cortex
CN105224947A (en) * 2014-06-06 2016-01-06 株式会社理光 Sorter training method and system
CN104850832A (en) * 2015-05-06 2015-08-19 中国科学院信息工程研究所 Hierarchical iteration-based large-scale image sample marking method and system
CN108062341A (en) * 2016-11-08 2018-05-22 中国移动通信有限公司研究院 The automatic marking method and device of data
CN107705256A (en) * 2017-09-13 2018-02-16 西南交通大学 A kind of forward direction Vehicular video image enchancing method semantic based on contact net

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
HOO-CHANG SHIN ET AL.: "Deep Convolutional Neural Networks for Computer-Aided Detection: CNN Architectures,Dataset Characteristics and Transfer Learning", 《IEEE TRANSACTIONS ON MEDICAL IMAGING》 *

Cited By (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111652257A (en) * 2019-03-27 2020-09-11 上海铼锶信息技术有限公司 Sample data cleaning method and system
CN111651433A (en) * 2019-03-27 2020-09-11 上海铼锶信息技术有限公司 Sample data cleaning method and system
CN111651433B (en) * 2019-03-27 2023-05-12 上海铼锶信息技术有限公司 Sample data cleaning method and system
CN110083728A (en) * 2019-04-03 2019-08-02 上海联隐电子科技合伙企业(有限合伙) A kind of methods, devices and systems of optimization automation image data cleaning quality
CN110083728B (en) * 2019-04-03 2021-08-20 上海铼锶信息技术有限公司 Method, device and system for optimizing automatic picture data cleaning quality
CN111652259A (en) * 2019-04-16 2020-09-11 上海铼锶信息技术有限公司 Method and system for cleaning data
CN111652259B (en) * 2019-04-16 2024-03-08 上海铼锶信息技术有限公司 Method and system for cleaning data
CN110210536A (en) * 2019-05-22 2019-09-06 北京邮电大学 A kind of the physical damnification diagnostic method and device of optical interconnection system
CN111160406A (en) * 2019-12-10 2020-05-15 北京达佳互联信息技术有限公司 Training method of image classification model, and image classification method and device
CN112287923A (en) * 2020-12-24 2021-01-29 德联易控科技(北京)有限公司 Card information identification method, device, equipment and storage medium

Similar Documents

Publication Publication Date Title
CN108874900A (en) A kind of acquisition methods and system of samples pictures data acquisition system
CN106682704B (en) A kind of disease geo-radar image recognition methods of integrating context information
CN106238342B (en) Panoramic vision potato sorts and defect detecting device and its sorting detection method
CN107622233A (en) A kind of Table recognition method, identifying system and computer installation
CN104899871B (en) A kind of IC elements solder joint missing solder detection method
CN103985182B (en) A kind of bus passenger flow automatic counting method and automatic counter system
CN108596338A (en) A kind of acquisition methods and its system of neural metwork training collection
CN108686978A (en) The method for sorting and system of fruit classification and color and luster based on ARM
CN109102515A (en) A kind of method for cell count based on multiple row depth convolutional neural networks
Gyawali et al. Comparative analysis of multiple deep CNN models for waste classification
CN106841209A (en) One kind is based on big data self study chemical fiber wire ingot intelligence outward appearance detecting system and method
CN110963209A (en) Garbage sorting device and method based on deep reinforcement learning
Jin et al. Design and implementation of anti-leakage planting system for transplanting machine based on fuzzy information
CN107506793A (en) Clothes recognition methods and system based on weak mark image
CN107194418A (en) A kind of Aphids in Rice Field detection method based on confrontation feature learning
CN107832780A (en) Low confidence sample processing method and system are sorted based on artificial intelligence plank
CN104867145B (en) IC element welding point defect detection methods based on VIBE models
CN107121436B (en) The Intelligent detecting method and identification device of a kind of silicon material quality
CN108617480A (en) A kind of gardens intelligent irrigation system
CN108038415A (en) A kind of unmanned plane based on machine vision detects automatically and tracking
CN111652326A (en) Improved fruit maturity identification method and identification system based on MobileNet v2 network
CN107358176A (en) Sorting technique based on high score remote sensing image area information and convolutional neural networks
CN109241397A (en) A kind of method and apparatus for cleaning data
CN106250911A (en) A kind of picture classification method based on convolutional neural networks
CN108510739A (en) A kind of road traffic state recognition methods, system and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
WD01 Invention patent application deemed withdrawn after publication
WD01 Invention patent application deemed withdrawn after publication

Application publication date: 20181123