CN107977412A - It is a kind of based on iterative with interactive perceived age database cleaning method - Google Patents

It is a kind of based on iterative with interactive perceived age database cleaning method Download PDF

Info

Publication number
CN107977412A
CN107977412A CN201711170178.0A CN201711170178A CN107977412A CN 107977412 A CN107977412 A CN 107977412A CN 201711170178 A CN201711170178 A CN 201711170178A CN 107977412 A CN107977412 A CN 107977412A
Authority
CN
China
Prior art keywords
age
sample
grader
database
perceived
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201711170178.0A
Other languages
Chinese (zh)
Inventor
范伟琦
孙广玲
张天
邓小宝
陆小锋
钟宝燕
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
University of Shanghai for Science and Technology
Original Assignee
University of Shanghai for Science and Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by University of Shanghai for Science and Technology filed Critical University of Shanghai for Science and Technology
Priority to CN201711170178.0A priority Critical patent/CN107977412A/en
Publication of CN107977412A publication Critical patent/CN107977412A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/21Design, administration or maintenance of databases
    • G06F16/215Improving data quality; Data cleansing, e.g. de-duplication, removing invalid entries or correcting typographical errors
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2411Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on the proximity to a decision surface, e.g. support vector machines

Abstract

The invention discloses a kind of based on iterative and interactive perceived age database cleaning method, physiological age data base manipulation SVM trained to obtain grader A first, recycle grader A identification perceived age databases.It will identify that correct sample number is added in physiological age database, form new training set, new training set is trained to obtain grader B using SVM, recycles the sample of grader B recognition classifiers A identification mistakes.The sample of remaining identification mistake, when the fluctuation range of age recognition accuracy then stops iteration within 0.1%, after stopping iteration, is corrected the age label of sample by repetitive cycling using human-computer interaction by the way of, by revised sample repeatedly before step.Until sample, all identification is correct, then stops circulation, obtained sample finally is subtracted physiological age database.The invention can effectively clean the dirty data in age data storehouse, finally so that perceived age database is more accurate.

Description

It is a kind of based on iterative with interactive perceived age database cleaning method
Technical field
The present invention relates to a kind of cleaning method of perceived age database, it is more particularly to a kind of based on it is iterative with it is interactive The cleaning method of perceived age database.
Background technology
In traditional concept, perceived age may with experience, responsibility, grow up that these are associated together, it is and different People age of same person is perceived also can there are certain difference, and the physiological age of a people be will not be by extraneous factor shadow Ring and change, because this person perceived age and physiological age can there are certain difference.In the work of age data storehouse mark, only Only rely on the subjective perception of people face database is carried out age label mark can there are certain error, this error to give Age data storehouse introduces a certain amount of dirty data, and so-called dirty data is exactly physiological age and the larger sample of perceived age difference. If such age data storehouse is directly used without cleaning, then such age data storehouse does not just have precision can Speech, can cause error to user in experimental data.
Database field has the data cleansing technology of many maturations, and still, most of technology is specific both for some Data quality problem (Data duplication), the interactive function of these systems is also often limited.In addition, dirty data is ubiquitous, It is fully erased currently without general effective method.It is simple that many work for being related to data prediction usually only carry out some Artificial data is cleaned, or even some assume the original clean of data, and ignores the quality problems in initial data.Therefore this is taken It is often incorrect or unilateral that a little data, which do the result that experiment is drawn,.
The content of the invention
It is an object of the invention to improve the precision of perceived age database, propose it is a kind of based on it is iterative with it is interactive The cleaning method of perceived age database, this method using it is iterative with interactive method come clear to perceived age database Wash, can effectively clean the dirty data in perceived age database so that the precision higher in age data storehouse.
In order to achieve the above object, the technical solution adopted by the present invention is as follows:
It is a kind of based on iterative with interactive perceived age database cleaning method, comprise the following steps that:
(1), physiological age data base manipulation support vector machines, i.e. SVM, training are obtained into grader A, recycles grader A identifies perceived age database;
(2), it will identify that correct sample number is added in physiological age database, form new training set, by new instruction Practice collection to train to obtain grader B using SVM, recycle the sample of grader B recognition classifiers A identification mistakes;
(3), repetitive cycling step (2);
(4), the fluctuation range of age recognition accuracy then stops iteration within 0.1%, otherwise return to step (3);
(5), after stopping iteration, the age that the sample of remaining identification mistake is corrected to sample by the way of human-computer interaction marks Label, after amendment, for the subjective vision of people perceives angle, closer to its physiological age;
(6), by revised sample repeat step (1), (2), (3), (4), (5);
(7), all identification is correct for sample, then stops circulation, otherwise return to step (6);
(8), the correct samples of all identifications are subtracted to the physiological age databases added in step (2), then it is remaining just It is by iterative and interactive cleaning and amended perceived age database.
SVM training methods are utilized in above-mentioned steps (1), are exactly all features according to selected kernel function to sample set Vector is calculated, and construction one is the feature space that sample can divide, it is comprised the following steps that:
(1-1), kernel function are selected:The kernel function used is Gauss function:
The feature that the selected kernel function of (1-2), basis calculates each feature vector in each grader respectively is related Value;
(1-3), according to these feature correlation value calculation covariance matrix spaces;
(1-4), carry out mirror transformation to this covariance matrix space, i.e., is by a hyperplane by a vector transformation The mirror image of reflection;
(1-5), obtain covariance matrix and its corresponding hyperplane matrix, is calculated respectively according to the two matrixes each The characteristic coefficient of feature, and characteristic coefficient zooms in and out covariance matrix;
(1-6), obtain model parameter.
The age label of sample is corrected in above-mentioned steps (5) by way of human-computer interaction, specific method is:With reference to classification The age recognition result that device provides, then the age label of sample is marked again according to the age perception of human eye, then mark again Sample afterwards is exactly revised sample.
Compared with prior art, the method for the present invention has the following advantages that:
The method of the present invention cleans perceived age database using iterative and interactive method, can effectively clean Dirty data in perceived age database so that the precision higher in age data storehouse.
Brief description of the drawings
Fig. 1 is the flow chart of the method for the present invention.
Fig. 2 is the sample illustrated in five physiological age databases, and the physiological age of people is correspond to immediately below sample.
Fig. 3 is the sample illustrated in five perceived age databases, and the perceived age of people is correspond to immediately below sample.
Fig. 4 is the present invention with the increase of iterations, the situation of change of age recognition accuracy.
Embodiment
The embodiment of the present invention is described in further detail below in conjunction with the accompanying drawings.
The emulation experiment that the present invention carries out is 3.4GHz, programs and realize on the interior PC test platforms for saving as 8G in CPU.
As shown in Figure 1, the present invention is a kind of based on iterative and interactive perceived age database cleaning method, its is specific Step is as follows:
(1), physiological age data base manipulation support vector machines, i.e. SVM, training are obtained into grader A, recycles grader A identifies perceived age database;
(1-1), kernel function are selected:The kernel function used is Gauss function:
The selected kernel function of (1-2), basis calculates the related value of each feature vector in each grader respectively;
(1-3), according to these feature correlation value calculation covariance matrix spaces;
(1-4), carry out this covariance matrix space mirror transformation, that is, by a vector transformation is super by one The mirror image of plane reflection;
(1-5), obtain covariance matrix and its corresponding hyperplane matrix, is calculated respectively according to the two matrixes each The characteristic coefficient of feature, and characteristic coefficient zooms in and out covariance matrix;
(1-6), obtain model parameter.
(2), it will identify that correct sample number is added in physiological age database, form new training set, by new instruction Practice collection to train to obtain grader B using SVM, recycle the sample of grader B recognition classifiers A identification mistakes;
(3), repetitive cycling step (2);
(4), the fluctuation range of age recognition accuracy then stops iteration within 0.1%, otherwise return to step (3);
(5), after stopping iteration, the age that the sample of remaining identification mistake is corrected to sample by the way of human-computer interaction marks Label, after amendment, for the subjective vision of people perceives angle, closer to its physiological age;
(6), by revised sample repeat step (1), (2), (3), (4), (5);
(7), all identification is correct for sample, then stops circulation, otherwise return to step (6);
(8), the correct samples of all identifications are subtracted to the physiological age databases added in step (2), then it is remaining just It is by iterative and interactive cleaning and amended perceived age database.
If Fig. 2 is the sample that illustrates in five physiological age databases, the physiological age of people is correspond to immediately below sample. It will be seen that there are certain error the physiological age provided age for oneself perceiving out with us from five figures. This shows the perceived age of people and physiological age can there are certain difference.
If Fig. 3 is the sample that illustrates in five perceived age databases, the perceived age of people is correspond to immediately below sample. It will be seen that there are certain error the perceived age provided age for oneself perceiving out with us from five figures. This shows age perception of the different people to same person, and there are certain difference.
As shown in figure 4, as training set, trained using SVM is had the perceived age database after iteration each time The model of age identification, then with fixed test set to train come model tested to obtain age knowledge after each iteration Not rate.The test set of experiment is the perceived age database containing certain dirty data, but the test set tested each time is Invariable, so ensure that the real reliability of experimental data.
The method that can be seen that the present invention from Fig. 4 experimental results is based on iterative and interactive perceived age using a kind of The cleaning method of database, can effectively clean the dirty data in perceived age database so that age data storehouse it is accurate Spend higher.

Claims (3)

  1. It is 1. a kind of based on iterative and interactive perceived age database cleaning method, it is characterised in that to comprise the following steps that:
    (1), physiological age data base manipulation support vector machines, i.e. SVM, training are obtained into grader A, recycles grader A to know Other perceived age database;
    (2), it will identify that correct sample number is added in physiological age database, form new training set, by new training set Train to obtain grader B using SVM, recycle the sample of grader B recognition classifiers A identification mistakes;
    (3), repetitive cycling step (2);
    (4), the fluctuation range of age recognition accuracy then stops iteration within 0.1%, otherwise return to step (3);
    (5), after stopping iteration, the sample of remaining identification mistake is corrected to the age label of sample by the way of human-computer interaction, After amendment, for the subjective vision of people perceives angle, closer to its physiological age;
    (6), by revised sample repeat step (1), (2), (3), (4), (5);
    (7), all identification is correct for sample, then stops circulation, otherwise return to step (6);
    (8), all correct samples of identification are subtracted to the physiological age database added in step (2), then remaining is exactly to pass through Cross iterative and interactive cleaning and amended perceived age database.
  2. 2. according to claim 1 existed based on iterative with interactive perceived age database cleaning method, its feature In the utilization SVM training methods in the step (1), are exactly all feature vectors according to selected kernel function to sample set Calculated, construction one is the feature space that sample can divide, it is comprised the following steps that:
    (1-1), kernel function are selected:The kernel function used is Gauss function:
    <mrow> <mi>K</mi> <mrow> <mo>(</mo> <mi>x</mi> <mo>,</mo> <mi>z</mi> <mo>)</mo> </mrow> <mo>=</mo> <mi>exp</mi> <mrow> <mo>(</mo> <mo>-</mo> <mfrac> <mrow> <mo>|</mo> <mo>|</mo> <mi>x</mi> <mo>-</mo> <mi>z</mi> <mo>|</mo> <msup> <mo>|</mo> <mn>2</mn> </msup> </mrow> <mrow> <mn>2</mn> <msup> <mi>&amp;delta;</mi> <mn>2</mn> </msup> </mrow> </mfrac> <mo>)</mo> </mrow> </mrow>
    The selected kernel function of (1-2), basis calculates the feature correlation of each feature vector in each grader respectively;
    (1-3), according to these feature correlation value calculation covariance matrix spaces;
    (1-4), carry out mirror transformation to this covariance matrix space, i.e., is to be reflected by a hyperplane by a vector transformation Mirror image;
    (1-5), obtain covariance matrix and its corresponding hyperplane matrix, and each feature is calculated respectively according to the two matrixes Characteristic coefficient, and characteristic coefficient zooms in and out covariance matrix;
    (1-6), obtain model parameter.
  3. 3. according to claim 1 existed based on iterative with interactive perceived age database cleaning method, its feature In the age label of sample being corrected in the step (5) by way of human-computer interaction, specific method is:Given with reference to grader The age recognition result gone out, then the age label of sample is marked again according to the age perception of human eye, then after marking again Sample is exactly revised sample.
CN201711170178.0A 2017-11-22 2017-11-22 It is a kind of based on iterative with interactive perceived age database cleaning method Pending CN107977412A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201711170178.0A CN107977412A (en) 2017-11-22 2017-11-22 It is a kind of based on iterative with interactive perceived age database cleaning method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201711170178.0A CN107977412A (en) 2017-11-22 2017-11-22 It is a kind of based on iterative with interactive perceived age database cleaning method

Publications (1)

Publication Number Publication Date
CN107977412A true CN107977412A (en) 2018-05-01

Family

ID=62010761

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201711170178.0A Pending CN107977412A (en) 2017-11-22 2017-11-22 It is a kind of based on iterative with interactive perceived age database cleaning method

Country Status (1)

Country Link
CN (1) CN107977412A (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108985173A (en) * 2018-06-19 2018-12-11 奕通信息科技(上海)股份有限公司 Towards the depth network migration learning method for having the label apparent age data library of noise
CN109034188A (en) * 2018-06-15 2018-12-18 北京金山云网络技术有限公司 Acquisition methods, acquisition device, equipment and the storage medium of machine learning model
CN110083728A (en) * 2019-04-03 2019-08-02 上海联隐电子科技合伙企业(有限合伙) A kind of methods, devices and systems of optimization automation image data cleaning quality
CN110688471A (en) * 2019-09-30 2020-01-14 支付宝(杭州)信息技术有限公司 Training sample obtaining method, device and equipment

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103150578A (en) * 2013-04-09 2013-06-12 山东师范大学 Training method of SVM (Support Vector Machine) classifier based on semi-supervised learning
CN105045807A (en) * 2015-06-04 2015-11-11 浙江力石科技股份有限公司 Data cleaning algorithm based on Internet trading information
CN106778851A (en) * 2016-12-05 2017-05-31 公安部第三研究所 Social networks forecasting system and its method based on Mobile Phone Forensics data
CN106844636A (en) * 2017-01-21 2017-06-13 亚信蓝涛(江苏)数据科技有限公司 A kind of unstructured data processing method based on deep learning

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103150578A (en) * 2013-04-09 2013-06-12 山东师范大学 Training method of SVM (Support Vector Machine) classifier based on semi-supervised learning
CN105045807A (en) * 2015-06-04 2015-11-11 浙江力石科技股份有限公司 Data cleaning algorithm based on Internet trading information
CN106778851A (en) * 2016-12-05 2017-05-31 公安部第三研究所 Social networks forecasting system and its method based on Mobile Phone Forensics data
CN106844636A (en) * 2017-01-21 2017-06-13 亚信蓝涛(江苏)数据科技有限公司 A kind of unstructured data processing method based on deep learning

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
赖德河: "人脸年龄估计方法的研究", 《中国优秀硕士学位论文全文数据库 信息科技辑》 *
陈小柏: "基于视觉的连续手语识别系统的研究", 《中国优秀硕士学位论文全文数据库 信息科技辑》 *

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109034188A (en) * 2018-06-15 2018-12-18 北京金山云网络技术有限公司 Acquisition methods, acquisition device, equipment and the storage medium of machine learning model
CN109034188B (en) * 2018-06-15 2021-11-05 北京金山云网络技术有限公司 Method and device for acquiring machine learning model, equipment and storage medium
CN108985173A (en) * 2018-06-19 2018-12-11 奕通信息科技(上海)股份有限公司 Towards the depth network migration learning method for having the label apparent age data library of noise
CN110083728A (en) * 2019-04-03 2019-08-02 上海联隐电子科技合伙企业(有限合伙) A kind of methods, devices and systems of optimization automation image data cleaning quality
CN110083728B (en) * 2019-04-03 2021-08-20 上海铼锶信息技术有限公司 Method, device and system for optimizing automatic picture data cleaning quality
CN110688471A (en) * 2019-09-30 2020-01-14 支付宝(杭州)信息技术有限公司 Training sample obtaining method, device and equipment
CN110688471B (en) * 2019-09-30 2022-09-09 支付宝(杭州)信息技术有限公司 Training sample obtaining method, device and equipment

Similar Documents

Publication Publication Date Title
CN107977412A (en) It is a kind of based on iterative with interactive perceived age database cleaning method
Batchelor et al. Intelligent vision systems for industry
US20190220657A1 (en) Motion recognition device and motion recognition method
CN107392125A (en) Training method/system, computer-readable recording medium and the terminal of model of mind
CN107808143A (en) Dynamic gesture identification method based on computer vision
CN104463101A (en) Answer recognition method and system for textual test question
CN105320945A (en) Image classification method and apparatus
CN110580466A (en) infant quilt kicking behavior recognition method and device, computer equipment and storage medium
CN109284779A (en) Object detecting method based on the full convolutional network of depth
CN109272003A (en) A kind of method and apparatus for eliminating unknown error in deep learning model
CN109858476A (en) The extending method and electronic equipment of label
CN103093237B (en) A kind of method for detecting human face of structure based model
CN104919492A (en) Device for detecting feature-point position, method for detecting feature-point position, and program for detecting feature-point position
US20200380292A1 (en) Method and device for identifying object and computer readable storage medium
CN109829354B (en) Face recognition method based on deep learning
CN113763348A (en) Image quality determination method and device, electronic equipment and storage medium
CN110717385A (en) Dynamic gesture recognition method
CN112420125A (en) Molecular attribute prediction method and device, intelligent equipment and terminal
CN106372652A (en) Hair style identification method and hair style identification apparatus
CN109101922A (en) Operating personnel device, assay, device and electronic equipment
CN104978569A (en) Sparse representation based incremental face recognition method
CN113283445A (en) Image processing method and device and computer equipment
CN110070120B (en) Depth measurement learning method and system based on discrimination sampling strategy
CN111414930B (en) Deep learning model training method and device, electronic equipment and storage medium
CN113592906B (en) Long video target tracking method and system based on annotation frame feature fusion

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20180501

RJ01 Rejection of invention patent application after publication