CN109189771A - It is a kind of based on offline and on-line talking model data library cleaning method - Google Patents
It is a kind of based on offline and on-line talking model data library cleaning method Download PDFInfo
- Publication number
- CN109189771A CN109189771A CN201810941282.3A CN201810941282A CN109189771A CN 109189771 A CN109189771 A CN 109189771A CN 201810941282 A CN201810941282 A CN 201810941282A CN 109189771 A CN109189771 A CN 109189771A
- Authority
- CN
- China
- Prior art keywords
- class
- offline
- vehicle
- model data
- clustered
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/23—Clustering techniques
- G06F18/232—Non-hierarchical techniques
- G06F18/2321—Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions
- G06F18/23213—Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions with fixed number of clusters, e.g. K-means clustering
Landscapes
- Engineering & Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Artificial Intelligence (AREA)
- Life Sciences & Earth Sciences (AREA)
- Bioinformatics & Computational Biology (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Evolutionary Biology (AREA)
- Evolutionary Computation (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Probability & Statistics with Applications (AREA)
- Traffic Control Systems (AREA)
Abstract
The invention discloses a kind of based on offline and on-line talking model data library cleaning method.The present invention marks all kinds of vehicle samples first and obtains offline vehicle library, is trained using deep learning, takes the reversed second full articulamentum output of training as vehicle feature.Secondly all vehicle features in each class are extracted respectively to be clustered offline, obtain n class center and respective threshold.Then it periodically extracts all vehicle features in the online each class in vehicle library to be clustered, initial cluster center is to cluster n obtained class center offline, carries out constrained cluster after adding the class at a random initializtion center, obtains n+1 class.It is last successively to determine and clean the model data for belonging to preceding n class according to obtained threshold value is clustered offline, clean last a kind of model data.The present invention can be in the case where keeping that all kinds of bulk properties are constant in online vehicle library, the sample of effective deletion error storage, to maintain the long-term running performance of system and stability.
Description
Technical field
The present invention relates to a kind of based on offline and on-line talking model data library cleaning method.
Background technique
As vehicle guaranteeding organic quantity sharply increases, delinquent vehicle rises year by year trend, such as: hit-and-run, vehicle
False-trademark, vehicle fake-license, the criminal phenomenas such as automobile overspeed are often all being showed.And the development of technology, intelligent model recognizing method
Just become a kind of effective means of maturation, can be widely used in bayonet vehicle detection, fake-licensed car detection, vehicle retrieval etc..
In many applications, it requires to establish an online vehicle library.Vehicle cab recognition technology energy based on deep learning
Enough reach 98% or more accuracy rate, but in longtime running, the error sample being constantly put in storage can still result in its accumulation to a difficulty
To maintain the degree of system performance and stability.On this basis, it needs to propose a kind of based on offline and on-line talking vehicle
Database cleaning method carries out periodic cleaning to vehicle library to keep system performance and stability.
Existing various database cleaning methods be mostly general data cleaning or for a certain specific area data cleansing,
Lack the cleaning method for model data library.As " a kind of data cleaning method 201710704678.1 ", " a kind of simplification it is big
Data cleansing mode 201711182073.7 " etc..
Summary of the invention
In view of the deficiencies of the prior art, the present invention provides a kind of based on offline and on-line talking model data library cleaning
Method.This method is directed to the vehicle inventory of online updating the case where a certain amount of mistake is put in storage data, is gathered using offline with online
Class carries out periodic cleaning to vehicle library to keep system performance and stability.
The technical solution adopted for solving the technical problem of the present invention are as follows:
One, marks all kinds of vehicle samples and obtains offline vehicle library, is trained using deep learning, takes training reversed second
Full articulamentum output is used as vehicle feature.
Two, extract all vehicle features in each class respectively and are clustered offline, obtain n class center and respective threshold.
Three, periodically extract all vehicle features in the online each class in vehicle library and are clustered, and initial cluster center is offline
N obtained class center is clustered, constrained cluster is carried out after adding the class at a random initializtion center, obtains n+1 class.
Four, successively determine and clean the model data for belonging to preceding n class according to obtained threshold value is clustered offline, and cleaning is last
A kind of model data.
Beneficial effects of the present invention: the present invention can be to the online model data library established in the application of intelligent vehicle cab recognition
Regular Rapid Cleaning is carried out, it can be in the case where keeping that all kinds of bulk properties are constant in online vehicle library, effective deletion error
The sample of storage, to maintain the long-term running performance of system and stability.
Detailed description of the invention
Fig. 1 is offline deep learning training network structure.
Specific embodiment
In order to make the object, technical scheme and advantages of the embodiment of the invention clearer, below in conjunction with the embodiment of the present invention
In attached drawing, the technical scheme in the embodiment of the invention is clearly and completely described, it is clear that described embodiment is only
It is only a part of the embodiment of the present invention, instead of all the embodiments.Based on the embodiment of the present invention, ordinary skill people
Member's every other embodiment obtained without making creative work, shall fall within the protection scope of the present invention.
Offline part of the invention includes that vehicle feature is trained and vehicle feature clusters offline, and online part includes that vehicle is special
Sign extraction, vehicle feature on-line talking.
One, marks all kinds of vehicle samples and obtains offline vehicle library, is trained using deep learning, takes training reversed second
A full articulamentum output is used as vehicle feature.
Two, extract all vehicle features in each class respectively and are clustered, and obtain n class center and respective threshold.
Three, periodically extract all vehicle features in the online each class in vehicle library and are clustered, and initial cluster center is offline
N obtained class center is clustered, constrained cluster is carried out after adding the class at a random initializtion center, obtains n+1 class.
Four, successively determine and clean the model data for belonging to preceding n class according to obtained threshold value is clustered offline, and cleaning is last
A kind of model data.
Embodiment:
One, marks all kinds of vehicle samples and obtains offline vehicle library, is trained (see figure 1) using deep learning, takes training reversed
512 dimensional features of second full articulamentum output are as vehicle feature.
Two, extract all vehicle features in each class respectively and are clustered offline, and characteristic distance uses cosine similarity.It follows
Ring calls k-means to cluster to obtain 1 to 5 class as a result, selecting the n-th class according to class inherited in class as a result, counting all spies in class
The standard deviation of sign and class centre distance, obtains threshold value.
Three, periodically extract all vehicle features in the online each class in vehicle library and carry out on-line talking.Similarly, same to use
K-means cluster, initial cluster center is to cluster n obtained class center offline, adds the class at a random initializtion center,
N class off-centring degree is clustered less than 30 degree before constraining, and obtains n+1 class.
Four, successively determine (obtained distance and threshold value comparison) and clean to belong to preceding n according to obtained threshold value is clustered offline
The model data of class cleans last a kind of model data.
The foregoing is only a preferred embodiment of the present invention, is not intended to limit the scope of the present invention, should
Understand, the present invention is not limited to implementation as described herein, the purpose of these implementations description is to help this field
In technical staff practice the present invention.
Claims (4)
1. a kind of based on offline and on-line talking model data library cleaning method, it is characterised in that this method includes following step
It is rapid:
One, marks all kinds of vehicle samples and obtains offline vehicle library, is trained using deep learning, takes training reversed second
Full articulamentum output is used as vehicle feature;
Two, extract all vehicle features in each class respectively and are clustered offline, obtain n class center and respective threshold;
Three, periodically extract all vehicle features in the online each class in vehicle library and are clustered, and initial cluster center is offline cluster
N obtained class center, carries out constrained cluster after adding the class at a random initializtion center, obtains n+1 class;
Four, successively determine and clean the model data for belonging to preceding n class according to obtained threshold value is clustered offline, clean last a kind of
Model data.
2. according to claim 1 a kind of based on offline and on-line talking model data library cleaning method, feature exists
In: full articulamentum exports 512 dimensional features altogether.
3. according to claim 1 a kind of based on offline and on-line talking model data library cleaning method, feature exists
In: the characteristic distance in offline cluster process uses cosine similarity, passes through all features in statistics class and class centre distance
Standard deviation obtains threshold value.
4. according to claim 1 a kind of based on offline and on-line talking model data library cleaning method, feature exists
In: constrained cluster refers to that the preceding n class off-centring degree of constraint is clustered less than 30 degree.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810941282.3A CN109189771A (en) | 2018-08-17 | 2018-08-17 | It is a kind of based on offline and on-line talking model data library cleaning method |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810941282.3A CN109189771A (en) | 2018-08-17 | 2018-08-17 | It is a kind of based on offline and on-line talking model data library cleaning method |
Publications (1)
Publication Number | Publication Date |
---|---|
CN109189771A true CN109189771A (en) | 2019-01-11 |
Family
ID=64918265
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201810941282.3A Pending CN109189771A (en) | 2018-08-17 | 2018-08-17 | It is a kind of based on offline and on-line talking model data library cleaning method |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN109189771A (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114490596A (en) * | 2021-12-08 | 2022-05-13 | 大唐水电科学技术研究院有限公司 | Method for cleaning transformer oil chromatographic data based on machine learning and neural network |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20040181526A1 (en) * | 2003-03-11 | 2004-09-16 | Lockheed Martin Corporation | Robust system for interactively learning a record similarity measurement |
CN102932738A (en) * | 2012-10-31 | 2013-02-13 | 北京交通大学 | Improved positioning method of indoor fingerprint based on clustering neural network |
CN106204335A (en) * | 2016-07-21 | 2016-12-07 | 广东工业大学 | A kind of electricity price performs abnormality judgment method, Apparatus and system |
CN106202335A (en) * | 2016-06-28 | 2016-12-07 | 银江股份有限公司 | A kind of big Data Cleaning Method of traffic based on cloud computing framework |
CN106740829A (en) * | 2017-03-23 | 2017-05-31 | 吉林大学 | Based on the double semi-dragging truck riding stability automatic identifications of cluster analysis and early warning system |
-
2018
- 2018-08-17 CN CN201810941282.3A patent/CN109189771A/en active Pending
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20040181526A1 (en) * | 2003-03-11 | 2004-09-16 | Lockheed Martin Corporation | Robust system for interactively learning a record similarity measurement |
CN102932738A (en) * | 2012-10-31 | 2013-02-13 | 北京交通大学 | Improved positioning method of indoor fingerprint based on clustering neural network |
CN106202335A (en) * | 2016-06-28 | 2016-12-07 | 银江股份有限公司 | A kind of big Data Cleaning Method of traffic based on cloud computing framework |
CN106204335A (en) * | 2016-07-21 | 2016-12-07 | 广东工业大学 | A kind of electricity price performs abnormality judgment method, Apparatus and system |
CN106740829A (en) * | 2017-03-23 | 2017-05-31 | 吉林大学 | Based on the double semi-dragging truck riding stability automatic identifications of cluster analysis and early warning system |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114490596A (en) * | 2021-12-08 | 2022-05-13 | 大唐水电科学技术研究院有限公司 | Method for cleaning transformer oil chromatographic data based on machine learning and neural network |
CN114490596B (en) * | 2021-12-08 | 2024-05-10 | 大唐水电科学技术研究院有限公司 | Method for cleaning transformer oil chromatographic data based on machine learning and neural network |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN105468677B (en) | A kind of Log Clustering method based on graph structure | |
CN109165294A (en) | Short text classification method based on Bayesian classification | |
CN108985380B (en) | Point switch fault identification method based on cluster integration | |
US9967321B2 (en) | Meme discovery system | |
CN110210660B (en) | Ultra-short-term wind speed prediction method | |
CN109145180B (en) | Enterprise hot event mining method based on incremental clustering | |
CN103617233A (en) | Method and device for detecting repeated video based on semantic content multilayer expression | |
CN105488211A (en) | Method for determining user group based on feature analysis | |
CN104182460A (en) | Time sequence similarity query method based on inverted indexes | |
CN104008106A (en) | Method and apparatus for obtaining hot topic | |
CN111556016B (en) | Network flow abnormal behavior identification method based on automatic encoder | |
CN104516962A (en) | Monitoring method and system for microblogging public opinion | |
CN104598632A (en) | Hot event detection method and device | |
CN104679738A (en) | Method and device for mining Internet hot words | |
CN101980210A (en) | Marked word classifying and grading method and system | |
CN104156403A (en) | Clustering-based big data normal-mode extracting method and system | |
CN109657063A (en) | A kind of processing method and storage medium of magnanimity environment-protection artificial reported event data | |
CN105512301A (en) | User grouping method based on social content | |
CN108683658B (en) | Industrial control network flow abnormity identification method based on multi-RBM network construction reference model | |
CN104951553A (en) | Content collecting and data mining platform accurate in data processing and implementation method thereof | |
CN105678244B (en) | A kind of near video search method based on improved edit-distance | |
CN109597901B (en) | Data analysis method based on biological data | |
CN109214445A (en) | A kind of multi-tag classification method based on artificial intelligence | |
CN109189771A (en) | It is a kind of based on offline and on-line talking model data library cleaning method | |
CN103279581A (en) | Method for performing video retrieval by compact video theme descriptors |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
WD01 | Invention patent application deemed withdrawn after publication | ||
WD01 | Invention patent application deemed withdrawn after publication |
Application publication date: 20190111 |