CN107977412A - It is a kind of based on iterative with interactive perceived age database cleaning method - Google Patents
It is a kind of based on iterative with interactive perceived age database cleaning method Download PDFInfo
- Publication number
- CN107977412A CN107977412A CN201711170178.0A CN201711170178A CN107977412A CN 107977412 A CN107977412 A CN 107977412A CN 201711170178 A CN201711170178 A CN 201711170178A CN 107977412 A CN107977412 A CN 107977412A
- Authority
- CN
- China
- Prior art keywords
- age
- sample
- grader
- database
- perceived
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/21—Design, administration or maintenance of databases
- G06F16/215—Improving data quality; Data cleansing, e.g. de-duplication, removing invalid entries or correcting typographical errors
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/241—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
- G06F18/2411—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on the proximity to a decision surface, e.g. support vector machines
Abstract
The invention discloses a kind of based on iterative and interactive perceived age database cleaning method, physiological age data base manipulation SVM trained to obtain grader A first, recycle grader A identification perceived age databases.It will identify that correct sample number is added in physiological age database, form new training set, new training set is trained to obtain grader B using SVM, recycles the sample of grader B recognition classifiers A identification mistakes.The sample of remaining identification mistake, when the fluctuation range of age recognition accuracy then stops iteration within 0.1%, after stopping iteration, is corrected the age label of sample by repetitive cycling using human-computer interaction by the way of, by revised sample repeatedly before step.Until sample, all identification is correct, then stops circulation, obtained sample finally is subtracted physiological age database.The invention can effectively clean the dirty data in age data storehouse, finally so that perceived age database is more accurate.
Description
Technical field
The present invention relates to a kind of cleaning method of perceived age database, it is more particularly to a kind of based on it is iterative with it is interactive
The cleaning method of perceived age database.
Background technology
In traditional concept, perceived age may with experience, responsibility, grow up that these are associated together, it is and different
People age of same person is perceived also can there are certain difference, and the physiological age of a people be will not be by extraneous factor shadow
Ring and change, because this person perceived age and physiological age can there are certain difference.In the work of age data storehouse mark, only
Only rely on the subjective perception of people face database is carried out age label mark can there are certain error, this error to give
Age data storehouse introduces a certain amount of dirty data, and so-called dirty data is exactly physiological age and the larger sample of perceived age difference.
If such age data storehouse is directly used without cleaning, then such age data storehouse does not just have precision can
Speech, can cause error to user in experimental data.
Database field has the data cleansing technology of many maturations, and still, most of technology is specific both for some
Data quality problem (Data duplication), the interactive function of these systems is also often limited.In addition, dirty data is ubiquitous,
It is fully erased currently without general effective method.It is simple that many work for being related to data prediction usually only carry out some
Artificial data is cleaned, or even some assume the original clean of data, and ignores the quality problems in initial data.Therefore this is taken
It is often incorrect or unilateral that a little data, which do the result that experiment is drawn,.
The content of the invention
It is an object of the invention to improve the precision of perceived age database, propose it is a kind of based on it is iterative with it is interactive
The cleaning method of perceived age database, this method using it is iterative with interactive method come clear to perceived age database
Wash, can effectively clean the dirty data in perceived age database so that the precision higher in age data storehouse.
In order to achieve the above object, the technical solution adopted by the present invention is as follows:
It is a kind of based on iterative with interactive perceived age database cleaning method, comprise the following steps that:
(1), physiological age data base manipulation support vector machines, i.e. SVM, training are obtained into grader A, recycles grader
A identifies perceived age database;
(2), it will identify that correct sample number is added in physiological age database, form new training set, by new instruction
Practice collection to train to obtain grader B using SVM, recycle the sample of grader B recognition classifiers A identification mistakes;
(3), repetitive cycling step (2);
(4), the fluctuation range of age recognition accuracy then stops iteration within 0.1%, otherwise return to step (3);
(5), after stopping iteration, the age that the sample of remaining identification mistake is corrected to sample by the way of human-computer interaction marks
Label, after amendment, for the subjective vision of people perceives angle, closer to its physiological age;
(6), by revised sample repeat step (1), (2), (3), (4), (5);
(7), all identification is correct for sample, then stops circulation, otherwise return to step (6);
(8), the correct samples of all identifications are subtracted to the physiological age databases added in step (2), then it is remaining just
It is by iterative and interactive cleaning and amended perceived age database.
SVM training methods are utilized in above-mentioned steps (1), are exactly all features according to selected kernel function to sample set
Vector is calculated, and construction one is the feature space that sample can divide, it is comprised the following steps that:
(1-1), kernel function are selected:The kernel function used is Gauss function:
The feature that the selected kernel function of (1-2), basis calculates each feature vector in each grader respectively is related
Value;
(1-3), according to these feature correlation value calculation covariance matrix spaces;
(1-4), carry out mirror transformation to this covariance matrix space, i.e., is by a hyperplane by a vector transformation
The mirror image of reflection;
(1-5), obtain covariance matrix and its corresponding hyperplane matrix, is calculated respectively according to the two matrixes each
The characteristic coefficient of feature, and characteristic coefficient zooms in and out covariance matrix;
(1-6), obtain model parameter.
The age label of sample is corrected in above-mentioned steps (5) by way of human-computer interaction, specific method is:With reference to classification
The age recognition result that device provides, then the age label of sample is marked again according to the age perception of human eye, then mark again
Sample afterwards is exactly revised sample.
Compared with prior art, the method for the present invention has the following advantages that:
The method of the present invention cleans perceived age database using iterative and interactive method, can effectively clean
Dirty data in perceived age database so that the precision higher in age data storehouse.
Brief description of the drawings
Fig. 1 is the flow chart of the method for the present invention.
Fig. 2 is the sample illustrated in five physiological age databases, and the physiological age of people is correspond to immediately below sample.
Fig. 3 is the sample illustrated in five perceived age databases, and the perceived age of people is correspond to immediately below sample.
Fig. 4 is the present invention with the increase of iterations, the situation of change of age recognition accuracy.
Embodiment
The embodiment of the present invention is described in further detail below in conjunction with the accompanying drawings.
The emulation experiment that the present invention carries out is 3.4GHz, programs and realize on the interior PC test platforms for saving as 8G in CPU.
As shown in Figure 1, the present invention is a kind of based on iterative and interactive perceived age database cleaning method, its is specific
Step is as follows:
(1), physiological age data base manipulation support vector machines, i.e. SVM, training are obtained into grader A, recycles grader
A identifies perceived age database;
(1-1), kernel function are selected:The kernel function used is Gauss function:
The selected kernel function of (1-2), basis calculates the related value of each feature vector in each grader respectively;
(1-3), according to these feature correlation value calculation covariance matrix spaces;
(1-4), carry out this covariance matrix space mirror transformation, that is, by a vector transformation is super by one
The mirror image of plane reflection;
(1-5), obtain covariance matrix and its corresponding hyperplane matrix, is calculated respectively according to the two matrixes each
The characteristic coefficient of feature, and characteristic coefficient zooms in and out covariance matrix;
(1-6), obtain model parameter.
(2), it will identify that correct sample number is added in physiological age database, form new training set, by new instruction
Practice collection to train to obtain grader B using SVM, recycle the sample of grader B recognition classifiers A identification mistakes;
(3), repetitive cycling step (2);
(4), the fluctuation range of age recognition accuracy then stops iteration within 0.1%, otherwise return to step (3);
(5), after stopping iteration, the age that the sample of remaining identification mistake is corrected to sample by the way of human-computer interaction marks
Label, after amendment, for the subjective vision of people perceives angle, closer to its physiological age;
(6), by revised sample repeat step (1), (2), (3), (4), (5);
(7), all identification is correct for sample, then stops circulation, otherwise return to step (6);
(8), the correct samples of all identifications are subtracted to the physiological age databases added in step (2), then it is remaining just
It is by iterative and interactive cleaning and amended perceived age database.
If Fig. 2 is the sample that illustrates in five physiological age databases, the physiological age of people is correspond to immediately below sample.
It will be seen that there are certain error the physiological age provided age for oneself perceiving out with us from five figures.
This shows the perceived age of people and physiological age can there are certain difference.
If Fig. 3 is the sample that illustrates in five perceived age databases, the perceived age of people is correspond to immediately below sample.
It will be seen that there are certain error the perceived age provided age for oneself perceiving out with us from five figures.
This shows age perception of the different people to same person, and there are certain difference.
As shown in figure 4, as training set, trained using SVM is had the perceived age database after iteration each time
The model of age identification, then with fixed test set to train come model tested to obtain age knowledge after each iteration
Not rate.The test set of experiment is the perceived age database containing certain dirty data, but the test set tested each time is
Invariable, so ensure that the real reliability of experimental data.
The method that can be seen that the present invention from Fig. 4 experimental results is based on iterative and interactive perceived age using a kind of
The cleaning method of database, can effectively clean the dirty data in perceived age database so that age data storehouse it is accurate
Spend higher.
Claims (3)
- It is 1. a kind of based on iterative and interactive perceived age database cleaning method, it is characterised in that to comprise the following steps that:(1), physiological age data base manipulation support vector machines, i.e. SVM, training are obtained into grader A, recycles grader A to know Other perceived age database;(2), it will identify that correct sample number is added in physiological age database, form new training set, by new training set Train to obtain grader B using SVM, recycle the sample of grader B recognition classifiers A identification mistakes;(3), repetitive cycling step (2);(4), the fluctuation range of age recognition accuracy then stops iteration within 0.1%, otherwise return to step (3);(5), after stopping iteration, the sample of remaining identification mistake is corrected to the age label of sample by the way of human-computer interaction, After amendment, for the subjective vision of people perceives angle, closer to its physiological age;(6), by revised sample repeat step (1), (2), (3), (4), (5);(7), all identification is correct for sample, then stops circulation, otherwise return to step (6);(8), all correct samples of identification are subtracted to the physiological age database added in step (2), then remaining is exactly to pass through Cross iterative and interactive cleaning and amended perceived age database.
- 2. according to claim 1 existed based on iterative with interactive perceived age database cleaning method, its feature In the utilization SVM training methods in the step (1), are exactly all feature vectors according to selected kernel function to sample set Calculated, construction one is the feature space that sample can divide, it is comprised the following steps that:(1-1), kernel function are selected:The kernel function used is Gauss function:<mrow> <mi>K</mi> <mrow> <mo>(</mo> <mi>x</mi> <mo>,</mo> <mi>z</mi> <mo>)</mo> </mrow> <mo>=</mo> <mi>exp</mi> <mrow> <mo>(</mo> <mo>-</mo> <mfrac> <mrow> <mo>|</mo> <mo>|</mo> <mi>x</mi> <mo>-</mo> <mi>z</mi> <mo>|</mo> <msup> <mo>|</mo> <mn>2</mn> </msup> </mrow> <mrow> <mn>2</mn> <msup> <mi>&delta;</mi> <mn>2</mn> </msup> </mrow> </mfrac> <mo>)</mo> </mrow> </mrow>The selected kernel function of (1-2), basis calculates the feature correlation of each feature vector in each grader respectively;(1-3), according to these feature correlation value calculation covariance matrix spaces;(1-4), carry out mirror transformation to this covariance matrix space, i.e., is to be reflected by a hyperplane by a vector transformation Mirror image;(1-5), obtain covariance matrix and its corresponding hyperplane matrix, and each feature is calculated respectively according to the two matrixes Characteristic coefficient, and characteristic coefficient zooms in and out covariance matrix;(1-6), obtain model parameter.
- 3. according to claim 1 existed based on iterative with interactive perceived age database cleaning method, its feature In the age label of sample being corrected in the step (5) by way of human-computer interaction, specific method is:Given with reference to grader The age recognition result gone out, then the age label of sample is marked again according to the age perception of human eye, then after marking again Sample is exactly revised sample.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201711170178.0A CN107977412A (en) | 2017-11-22 | 2017-11-22 | It is a kind of based on iterative with interactive perceived age database cleaning method |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201711170178.0A CN107977412A (en) | 2017-11-22 | 2017-11-22 | It is a kind of based on iterative with interactive perceived age database cleaning method |
Publications (1)
Publication Number | Publication Date |
---|---|
CN107977412A true CN107977412A (en) | 2018-05-01 |
Family
ID=62010761
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201711170178.0A Pending CN107977412A (en) | 2017-11-22 | 2017-11-22 | It is a kind of based on iterative with interactive perceived age database cleaning method |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN107977412A (en) |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108985173A (en) * | 2018-06-19 | 2018-12-11 | 奕通信息科技(上海)股份有限公司 | Towards the depth network migration learning method for having the label apparent age data library of noise |
CN109034188A (en) * | 2018-06-15 | 2018-12-18 | 北京金山云网络技术有限公司 | Acquisition methods, acquisition device, equipment and the storage medium of machine learning model |
CN110083728A (en) * | 2019-04-03 | 2019-08-02 | 上海联隐电子科技合伙企业(有限合伙) | A kind of methods, devices and systems of optimization automation image data cleaning quality |
CN110688471A (en) * | 2019-09-30 | 2020-01-14 | 支付宝(杭州)信息技术有限公司 | Training sample obtaining method, device and equipment |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103150578A (en) * | 2013-04-09 | 2013-06-12 | 山东师范大学 | Training method of SVM (Support Vector Machine) classifier based on semi-supervised learning |
CN105045807A (en) * | 2015-06-04 | 2015-11-11 | 浙江力石科技股份有限公司 | Data cleaning algorithm based on Internet trading information |
CN106778851A (en) * | 2016-12-05 | 2017-05-31 | 公安部第三研究所 | Social networks forecasting system and its method based on Mobile Phone Forensics data |
CN106844636A (en) * | 2017-01-21 | 2017-06-13 | 亚信蓝涛(江苏)数据科技有限公司 | A kind of unstructured data processing method based on deep learning |
-
2017
- 2017-11-22 CN CN201711170178.0A patent/CN107977412A/en active Pending
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103150578A (en) * | 2013-04-09 | 2013-06-12 | 山东师范大学 | Training method of SVM (Support Vector Machine) classifier based on semi-supervised learning |
CN105045807A (en) * | 2015-06-04 | 2015-11-11 | 浙江力石科技股份有限公司 | Data cleaning algorithm based on Internet trading information |
CN106778851A (en) * | 2016-12-05 | 2017-05-31 | 公安部第三研究所 | Social networks forecasting system and its method based on Mobile Phone Forensics data |
CN106844636A (en) * | 2017-01-21 | 2017-06-13 | 亚信蓝涛(江苏)数据科技有限公司 | A kind of unstructured data processing method based on deep learning |
Non-Patent Citations (2)
Title |
---|
赖德河: "人脸年龄估计方法的研究", 《中国优秀硕士学位论文全文数据库 信息科技辑》 * |
陈小柏: "基于视觉的连续手语识别系统的研究", 《中国优秀硕士学位论文全文数据库 信息科技辑》 * |
Cited By (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109034188A (en) * | 2018-06-15 | 2018-12-18 | 北京金山云网络技术有限公司 | Acquisition methods, acquisition device, equipment and the storage medium of machine learning model |
CN109034188B (en) * | 2018-06-15 | 2021-11-05 | 北京金山云网络技术有限公司 | Method and device for acquiring machine learning model, equipment and storage medium |
CN108985173A (en) * | 2018-06-19 | 2018-12-11 | 奕通信息科技(上海)股份有限公司 | Towards the depth network migration learning method for having the label apparent age data library of noise |
CN110083728A (en) * | 2019-04-03 | 2019-08-02 | 上海联隐电子科技合伙企业(有限合伙) | A kind of methods, devices and systems of optimization automation image data cleaning quality |
CN110083728B (en) * | 2019-04-03 | 2021-08-20 | 上海铼锶信息技术有限公司 | Method, device and system for optimizing automatic picture data cleaning quality |
CN110688471A (en) * | 2019-09-30 | 2020-01-14 | 支付宝(杭州)信息技术有限公司 | Training sample obtaining method, device and equipment |
CN110688471B (en) * | 2019-09-30 | 2022-09-09 | 支付宝(杭州)信息技术有限公司 | Training sample obtaining method, device and equipment |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN107977412A (en) | It is a kind of based on iterative with interactive perceived age database cleaning method | |
Batchelor et al. | Intelligent vision systems for industry | |
US20190220657A1 (en) | Motion recognition device and motion recognition method | |
CN107392125A (en) | Training method/system, computer-readable recording medium and the terminal of model of mind | |
CN107808143A (en) | Dynamic gesture identification method based on computer vision | |
CN104463101A (en) | Answer recognition method and system for textual test question | |
CN105320945A (en) | Image classification method and apparatus | |
CN110580466A (en) | infant quilt kicking behavior recognition method and device, computer equipment and storage medium | |
CN109284779A (en) | Object detecting method based on the full convolutional network of depth | |
CN109272003A (en) | A kind of method and apparatus for eliminating unknown error in deep learning model | |
CN109858476A (en) | The extending method and electronic equipment of label | |
CN103093237B (en) | A kind of method for detecting human face of structure based model | |
CN104919492A (en) | Device for detecting feature-point position, method for detecting feature-point position, and program for detecting feature-point position | |
US20200380292A1 (en) | Method and device for identifying object and computer readable storage medium | |
CN109829354B (en) | Face recognition method based on deep learning | |
CN113763348A (en) | Image quality determination method and device, electronic equipment and storage medium | |
CN110717385A (en) | Dynamic gesture recognition method | |
CN112420125A (en) | Molecular attribute prediction method and device, intelligent equipment and terminal | |
CN106372652A (en) | Hair style identification method and hair style identification apparatus | |
CN109101922A (en) | Operating personnel device, assay, device and electronic equipment | |
CN104978569A (en) | Sparse representation based incremental face recognition method | |
CN113283445A (en) | Image processing method and device and computer equipment | |
CN110070120B (en) | Depth measurement learning method and system based on discrimination sampling strategy | |
CN111414930B (en) | Deep learning model training method and device, electronic equipment and storage medium | |
CN113592906B (en) | Long video target tracking method and system based on annotation frame feature fusion |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20180501 |
|
RJ01 | Rejection of invention patent application after publication |