CN109189771A - It is a kind of based on offline and on-line talking model data library cleaning method - Google Patents

It is a kind of based on offline and on-line talking model data library cleaning method Download PDF

Info

Publication number
CN109189771A
CN109189771A CN201810941282.3A CN201810941282A CN109189771A CN 109189771 A CN109189771 A CN 109189771A CN 201810941282 A CN201810941282 A CN 201810941282A CN 109189771 A CN109189771 A CN 109189771A
Authority
CN
China
Prior art keywords
class
offline
vehicle
model data
clustered
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201810941282.3A
Other languages
Chinese (zh)
Inventor
尚凌辉
张兆生
王弘玥
余天明
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
ZHEJIANG ICARE VISION TECHNOLOGY Co Ltd
Original Assignee
ZHEJIANG ICARE VISION TECHNOLOGY Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by ZHEJIANG ICARE VISION TECHNOLOGY Co Ltd filed Critical ZHEJIANG ICARE VISION TECHNOLOGY Co Ltd
Priority to CN201810941282.3A priority Critical patent/CN109189771A/en
Publication of CN109189771A publication Critical patent/CN109189771A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • G06F18/232Non-hierarchical techniques
    • G06F18/2321Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions
    • G06F18/23213Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions with fixed number of clusters, e.g. K-means clustering

Landscapes

  • Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Probability & Statistics with Applications (AREA)
  • Traffic Control Systems (AREA)

Abstract

The invention discloses a kind of based on offline and on-line talking model data library cleaning method.The present invention marks all kinds of vehicle samples first and obtains offline vehicle library, is trained using deep learning, takes the reversed second full articulamentum output of training as vehicle feature.Secondly all vehicle features in each class are extracted respectively to be clustered offline, obtain n class center and respective threshold.Then it periodically extracts all vehicle features in the online each class in vehicle library to be clustered, initial cluster center is to cluster n obtained class center offline, carries out constrained cluster after adding the class at a random initializtion center, obtains n+1 class.It is last successively to determine and clean the model data for belonging to preceding n class according to obtained threshold value is clustered offline, clean last a kind of model data.The present invention can be in the case where keeping that all kinds of bulk properties are constant in online vehicle library, the sample of effective deletion error storage, to maintain the long-term running performance of system and stability.

Description

It is a kind of based on offline and on-line talking model data library cleaning method
Technical field
The present invention relates to a kind of based on offline and on-line talking model data library cleaning method.
Background technique
As vehicle guaranteeding organic quantity sharply increases, delinquent vehicle rises year by year trend, such as: hit-and-run, vehicle False-trademark, vehicle fake-license, the criminal phenomenas such as automobile overspeed are often all being showed.And the development of technology, intelligent model recognizing method Just become a kind of effective means of maturation, can be widely used in bayonet vehicle detection, fake-licensed car detection, vehicle retrieval etc..
In many applications, it requires to establish an online vehicle library.Vehicle cab recognition technology energy based on deep learning Enough reach 98% or more accuracy rate, but in longtime running, the error sample being constantly put in storage can still result in its accumulation to a difficulty To maintain the degree of system performance and stability.On this basis, it needs to propose a kind of based on offline and on-line talking vehicle Database cleaning method carries out periodic cleaning to vehicle library to keep system performance and stability.
Existing various database cleaning methods be mostly general data cleaning or for a certain specific area data cleansing, Lack the cleaning method for model data library.As " a kind of data cleaning method 201710704678.1 ", " a kind of simplification it is big Data cleansing mode 201711182073.7 " etc..
Summary of the invention
In view of the deficiencies of the prior art, the present invention provides a kind of based on offline and on-line talking model data library cleaning Method.This method is directed to the vehicle inventory of online updating the case where a certain amount of mistake is put in storage data, is gathered using offline with online Class carries out periodic cleaning to vehicle library to keep system performance and stability.
The technical solution adopted for solving the technical problem of the present invention are as follows:
One, marks all kinds of vehicle samples and obtains offline vehicle library, is trained using deep learning, takes training reversed second Full articulamentum output is used as vehicle feature.
Two, extract all vehicle features in each class respectively and are clustered offline, obtain n class center and respective threshold.
Three, periodically extract all vehicle features in the online each class in vehicle library and are clustered, and initial cluster center is offline N obtained class center is clustered, constrained cluster is carried out after adding the class at a random initializtion center, obtains n+1 class.
Four, successively determine and clean the model data for belonging to preceding n class according to obtained threshold value is clustered offline, and cleaning is last A kind of model data.
Beneficial effects of the present invention: the present invention can be to the online model data library established in the application of intelligent vehicle cab recognition Regular Rapid Cleaning is carried out, it can be in the case where keeping that all kinds of bulk properties are constant in online vehicle library, effective deletion error The sample of storage, to maintain the long-term running performance of system and stability.
Detailed description of the invention
Fig. 1 is offline deep learning training network structure.
Specific embodiment
In order to make the object, technical scheme and advantages of the embodiment of the invention clearer, below in conjunction with the embodiment of the present invention In attached drawing, the technical scheme in the embodiment of the invention is clearly and completely described, it is clear that described embodiment is only It is only a part of the embodiment of the present invention, instead of all the embodiments.Based on the embodiment of the present invention, ordinary skill people Member's every other embodiment obtained without making creative work, shall fall within the protection scope of the present invention.
Offline part of the invention includes that vehicle feature is trained and vehicle feature clusters offline, and online part includes that vehicle is special Sign extraction, vehicle feature on-line talking.
One, marks all kinds of vehicle samples and obtains offline vehicle library, is trained using deep learning, takes training reversed second A full articulamentum output is used as vehicle feature.
Two, extract all vehicle features in each class respectively and are clustered, and obtain n class center and respective threshold.
Three, periodically extract all vehicle features in the online each class in vehicle library and are clustered, and initial cluster center is offline N obtained class center is clustered, constrained cluster is carried out after adding the class at a random initializtion center, obtains n+1 class.
Four, successively determine and clean the model data for belonging to preceding n class according to obtained threshold value is clustered offline, and cleaning is last A kind of model data.
Embodiment:
One, marks all kinds of vehicle samples and obtains offline vehicle library, is trained (see figure 1) using deep learning, takes training reversed 512 dimensional features of second full articulamentum output are as vehicle feature.
Two, extract all vehicle features in each class respectively and are clustered offline, and characteristic distance uses cosine similarity.It follows Ring calls k-means to cluster to obtain 1 to 5 class as a result, selecting the n-th class according to class inherited in class as a result, counting all spies in class The standard deviation of sign and class centre distance, obtains threshold value.
Three, periodically extract all vehicle features in the online each class in vehicle library and carry out on-line talking.Similarly, same to use K-means cluster, initial cluster center is to cluster n obtained class center offline, adds the class at a random initializtion center, N class off-centring degree is clustered less than 30 degree before constraining, and obtains n+1 class.
Four, successively determine (obtained distance and threshold value comparison) and clean to belong to preceding n according to obtained threshold value is clustered offline The model data of class cleans last a kind of model data.
The foregoing is only a preferred embodiment of the present invention, is not intended to limit the scope of the present invention, should Understand, the present invention is not limited to implementation as described herein, the purpose of these implementations description is to help this field In technical staff practice the present invention.

Claims (4)

1. a kind of based on offline and on-line talking model data library cleaning method, it is characterised in that this method includes following step It is rapid:
One, marks all kinds of vehicle samples and obtains offline vehicle library, is trained using deep learning, takes training reversed second Full articulamentum output is used as vehicle feature;
Two, extract all vehicle features in each class respectively and are clustered offline, obtain n class center and respective threshold;
Three, periodically extract all vehicle features in the online each class in vehicle library and are clustered, and initial cluster center is offline cluster N obtained class center, carries out constrained cluster after adding the class at a random initializtion center, obtains n+1 class;
Four, successively determine and clean the model data for belonging to preceding n class according to obtained threshold value is clustered offline, clean last a kind of Model data.
2. according to claim 1 a kind of based on offline and on-line talking model data library cleaning method, feature exists In: full articulamentum exports 512 dimensional features altogether.
3. according to claim 1 a kind of based on offline and on-line talking model data library cleaning method, feature exists In: the characteristic distance in offline cluster process uses cosine similarity, passes through all features in statistics class and class centre distance Standard deviation obtains threshold value.
4. according to claim 1 a kind of based on offline and on-line talking model data library cleaning method, feature exists In: constrained cluster refers to that the preceding n class off-centring degree of constraint is clustered less than 30 degree.
CN201810941282.3A 2018-08-17 2018-08-17 It is a kind of based on offline and on-line talking model data library cleaning method Pending CN109189771A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810941282.3A CN109189771A (en) 2018-08-17 2018-08-17 It is a kind of based on offline and on-line talking model data library cleaning method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810941282.3A CN109189771A (en) 2018-08-17 2018-08-17 It is a kind of based on offline and on-line talking model data library cleaning method

Publications (1)

Publication Number Publication Date
CN109189771A true CN109189771A (en) 2019-01-11

Family

ID=64918265

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810941282.3A Pending CN109189771A (en) 2018-08-17 2018-08-17 It is a kind of based on offline and on-line talking model data library cleaning method

Country Status (1)

Country Link
CN (1) CN109189771A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114490596A (en) * 2021-12-08 2022-05-13 大唐水电科学技术研究院有限公司 Method for cleaning transformer oil chromatographic data based on machine learning and neural network

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20040181526A1 (en) * 2003-03-11 2004-09-16 Lockheed Martin Corporation Robust system for interactively learning a record similarity measurement
CN102932738A (en) * 2012-10-31 2013-02-13 北京交通大学 Improved positioning method of indoor fingerprint based on clustering neural network
CN106204335A (en) * 2016-07-21 2016-12-07 广东工业大学 A kind of electricity price performs abnormality judgment method, Apparatus and system
CN106202335A (en) * 2016-06-28 2016-12-07 银江股份有限公司 A kind of big Data Cleaning Method of traffic based on cloud computing framework
CN106740829A (en) * 2017-03-23 2017-05-31 吉林大学 Based on the double semi-dragging truck riding stability automatic identifications of cluster analysis and early warning system

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20040181526A1 (en) * 2003-03-11 2004-09-16 Lockheed Martin Corporation Robust system for interactively learning a record similarity measurement
CN102932738A (en) * 2012-10-31 2013-02-13 北京交通大学 Improved positioning method of indoor fingerprint based on clustering neural network
CN106202335A (en) * 2016-06-28 2016-12-07 银江股份有限公司 A kind of big Data Cleaning Method of traffic based on cloud computing framework
CN106204335A (en) * 2016-07-21 2016-12-07 广东工业大学 A kind of electricity price performs abnormality judgment method, Apparatus and system
CN106740829A (en) * 2017-03-23 2017-05-31 吉林大学 Based on the double semi-dragging truck riding stability automatic identifications of cluster analysis and early warning system

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114490596A (en) * 2021-12-08 2022-05-13 大唐水电科学技术研究院有限公司 Method for cleaning transformer oil chromatographic data based on machine learning and neural network
CN114490596B (en) * 2021-12-08 2024-05-10 大唐水电科学技术研究院有限公司 Method for cleaning transformer oil chromatographic data based on machine learning and neural network

Similar Documents

Publication Publication Date Title
CN105468677B (en) A kind of Log Clustering method based on graph structure
CN109165294A (en) Short text classification method based on Bayesian classification
CN108985380B (en) Point switch fault identification method based on cluster integration
US9967321B2 (en) Meme discovery system
CN110210660B (en) Ultra-short-term wind speed prediction method
CN109145180B (en) Enterprise hot event mining method based on incremental clustering
CN103617233A (en) Method and device for detecting repeated video based on semantic content multilayer expression
CN105488211A (en) Method for determining user group based on feature analysis
CN104182460A (en) Time sequence similarity query method based on inverted indexes
CN104008106A (en) Method and apparatus for obtaining hot topic
CN111556016B (en) Network flow abnormal behavior identification method based on automatic encoder
CN104516962A (en) Monitoring method and system for microblogging public opinion
CN104598632A (en) Hot event detection method and device
CN104679738A (en) Method and device for mining Internet hot words
CN101980210A (en) Marked word classifying and grading method and system
CN104156403A (en) Clustering-based big data normal-mode extracting method and system
CN109657063A (en) A kind of processing method and storage medium of magnanimity environment-protection artificial reported event data
CN105512301A (en) User grouping method based on social content
CN108683658B (en) Industrial control network flow abnormity identification method based on multi-RBM network construction reference model
CN104951553A (en) Content collecting and data mining platform accurate in data processing and implementation method thereof
CN105678244B (en) A kind of near video search method based on improved edit-distance
CN109597901B (en) Data analysis method based on biological data
CN109214445A (en) A kind of multi-tag classification method based on artificial intelligence
CN109189771A (en) It is a kind of based on offline and on-line talking model data library cleaning method
CN103279581A (en) Method for performing video retrieval by compact video theme descriptors

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
WD01 Invention patent application deemed withdrawn after publication
WD01 Invention patent application deemed withdrawn after publication

Application publication date: 20190111