CN110457370A - Outlier Detection system and method for cleaning in data mining based on artificial intelligence - Google Patents
Outlier Detection system and method for cleaning in data mining based on artificial intelligence Download PDFInfo
- Publication number
- CN110457370A CN110457370A CN201910740294.4A CN201910740294A CN110457370A CN 110457370 A CN110457370 A CN 110457370A CN 201910740294 A CN201910740294 A CN 201910740294A CN 110457370 A CN110457370 A CN 110457370A
- Authority
- CN
- China
- Prior art keywords
- data
- isolated point
- module
- curve
- rule
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Withdrawn
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/24—Querying
- G06F16/245—Query processing
- G06F16/2458—Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
- G06F16/2465—Query processing support for facilitating data mining operations in structured databases
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/24—Querying
- G06F16/245—Query processing
- G06F16/2458—Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
- G06F16/2468—Fuzzy queries
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- Software Systems (AREA)
- Mathematical Physics (AREA)
- Fuzzy Systems (AREA)
- Databases & Information Systems (AREA)
- Probability & Statistics with Applications (AREA)
- Computational Linguistics (AREA)
- Data Mining & Analysis (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Automation & Control Theory (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
Abstract
The invention belongs to data mining technology fields, disclose Outlier Detection system and method for cleaning in a kind of data mining based on artificial intelligence, according to certain moment isolated point information detector node data to isolated point information detector node clustering, the linear dimensionality reduction of data, it is fitted to data and curves, detection curve, compare the trend and curve similarity of test curve and detection curve, it detects whether there is exception, existing abnormal data is pre-processed, is detected, is identified, data scrubbing.The problem of solving the problems, such as accurately be selected from Pearson correlation coefficients or Spearman's correlation coefficient during available data excavates Outlier Detection, and cannot effectively being cleared up for the isolated point data detected.The present invention finds out corresponding wrong data by detecting isolated point and is removed and clears up, and has achieved the purpose that improve data source data quality, provides new thinking for data mining aspect.
Description
Technical field
The invention belongs to isolated points in data mining technology field more particularly to a kind of data mining based on artificial intelligence
Detection system and method for cleaning.
Background technique
Isolated point can also refer to be in data acquisition system with the feature of most of data or inconsistent data.Isolated point
(Outlier)) refer to the data for not meeting model as data.When excavating normal class knowledge, usually always using they as
Interference signal is handled.When it is found that these data can (such as credit fraud, intrusion detection provide useful for the application of certain class
When letter meaning, a new research topic, i.e. isolated charged body just are provided for data mining.It was found that and detection isolated point method
By extensive discussions, mainly have based on probability statistics, based on distance and based on the class method of the detection techniques such as deviation, Bamett etc.
Establish the Outlier Detection concept based on statistical method.Based on state from Outlier Detection method by Knorr and Ng etc. one
It is described in detail in serial article.It now can be with reference to the research such as Aming and Agrawal in the Outlier Detection technology of deviation.At present
Isolated charged body becomes very the research for having application value as safety detections means such as credit card fraud, data network illegal invasions
Branch.However, when carrying out data dependence analysis during available data excavation Outlier Detection, if there is to analyzed number
It, then can not be accurately from Pearson correlation coefficients or Spearman's correlation coefficient according to having the uncomprehending situation of which kind of incidence relation
In selected;Simultaneously as error in data often shows as isolated point, therefore by detecting and removing the isolated point in data source
The purpose that can reach data scrubbing improves the quality of data of data source;But and not all isolated point is all wrong data, therefore
The metadata that how can also be combined domain knowledge after detecting isolated point or be stored need to be studied, corresponding mistake is therefrom found out
Data.
In conclusion problem of the existing technology is:
Available data is excavated during Outlier Detection when needing to carry out data dependence analysis, if there is to dividing
Analysing data has the uncomprehending situation of which kind of incidence relation, then can not be accurately related from Pearson correlation coefficients or Spearman
It is selected in coefficient.
Simultaneously as error in data often shows as isolated point, therefore can by detecting and removing the isolated point in data source
Achieve the purpose that data scrubbing, improves the quality of data of data source;But and not all isolated point is all wrong data, therefore is needed
The metadata how research can also combine domain knowledge or be stored after detecting isolated point, therefrom find out corresponding error number
According to.
Summary of the invention
In view of the problems of the existing technology, the present invention provides isolated points in a kind of data mining based on artificial intelligence
Detection system and method for cleaning.
The invention is realized in this way isolated point method for cleaning in a kind of data mining based on artificial intelligence, the base
Isolated point method for cleaning includes: in the data mining of artificial intelligence
According to certain mutually in the same time isolated point information detector node data to isolated point information detector node clustering, to point
Each cluster after cluster is respectively trained super ellipsoids and accordingly calculates each axial length of super ellipsoids, using axial length proportionality coefficient as coefficient to orphan
The point linear dimensionality reduction of information data is found, the data after dimensionality reduction are fitted to data and curves, as test curve.
Identical dimensionality reduction, curve fit process are made to the data of subsequent time same time period, the curve after fitting is as inspection
Survey curve.
Compare the trend and curve similarity of test curve and detection curve, the isolated point information data that detection node is collected
With the presence or absence of abnormal data.The abnormal data that will be present is as isolated point data.
The isolated point data of acquisition is called in as data for clearance by JDBC interface.
Data for clearance are pre-processed.
Outlier Detection, identification and processing are carried out to pretreated data.
Pass through JDBC interface export treated result data to source of new data.
Further, the method for obtaining isolated point data specifically includes:
S1: test data is chosen.
S2: node clustering is carried out to the test data of selection.
S3: the super ellipsoids to the cluster training divided just comprising all nodes in cluster, and calculate the axis of corresponding super ellipsoids
It is long.
S4: Data Dimensionality Reduction is carried out according to the axial length of each super ellipsoids.
S5: corresponding curve matching is carried out to the data after the axial length dimensionality reduction according to each super ellipsoids.
S6: detection data is chosen.
S7: processing detection data.
S8: carrying out similarity-rough set for test curve and detection curve, determines data with the presence or absence of abnormal data.
Further, the detailed process of step S2 are as follows:
Data are calculated by the node data of selection to node clustering according to the data of the identical moment point of each node
In the license radius of each dimension,
Judge ri dWithIt is whether adjacent.If adjacent, node i, j belongs to a cluster on dimension direction, only meets section
Point is when all belonging to the same cluster in all k dimensions, title node i, the same cluster of j, meanwhile, if two cluster CiAnd CjCluster sectionWithMeet
When being set up to all k, then cluster CiAnd CjCombinable is a cluster, and cluster radius is
CR=[MIN ({ mini,minj}),MAX({maxi,maxj})]。
The detailed process of step S3 are as follows:
Connection between data attribute described with the proportionate relationship between each axial length of super ellipsoids, super ellipsoids it is each
Axial length is respectively σpl≥σp-1l≥σp-2l≥…≥σ1l.Wherein, σi(1≤i≤p) indicates the covariance matrix Σ's of data set D
The square root of characteristic value, the mean value of data set D is indicated with μ, then corresponds to the axial length of super ellipsoids
The detailed process of step S4 are as follows: calculate the corresponding proportionality coefficient a of each axial length of super ellipsoidsiAnd as linear drop
The coefficient d of dimension, i.e.,
The detailed process of step S5 are as follows: carry out curve fitting to the data after dimensionality reduction in two-dimensional surface.Ten groups of data fittings
At eight smooth nonlinear function curves and its starting point is moved to origin, the curve after translation is as test curve f
(x)。
The detailed process of step S7 are as follows: data drop is carried out to the test data of selection according to the method for the step S4 and S5
Peacekeeping curve matching, obtains detection curve g (x).
Step S8 needs to determine exceptional value, detailed process by judging the similarity degree of two curves are as follows:
If f (x) is the test curve of fitting, g (x) is the curve to be detected of fitting, for preset threshold value c (0 <
C < 1), when curve f (x) and curve g (x) satisfaction, to arbitrary x ∈ X, have
| f (x)-g (x) | < c
Or meet
Then claim to be no different constant value presence at the node, otherwise it is assumed that there are exceptional values.
Further, Outlier Detection is carried out to pretreated data, knowledge method for distinguishing includes:
When there is new opplication to need to dispose in a data network, application rule is formulated by feature recognition module;It then will rule
Isolated point Data Detection is then carried out, it is regular to application to be detected with deployed application rule, if not isolated points
According to, directly deployment new opplication;The rule of isolated point data is eliminated if there is isolated point data, according to priority judgment criterion
The priority of rule is obtained, and eliminates the rule of isolated point data according to priority;The rule of isolated point data will be eliminated
It is configured in data network.
It specifically includes:
Step 1), when there is new opplication request, the rule that application is generated carries out data model conversion, i.e., by regular partition
For spatial domain S and action fields A;Then rule is forwarded to isolated point data detection module, and judges that new opplication itself rule is
It is no to belong to the application type that can produce isolated point data, it is no to then follow the steps 3) if it is execution step 2).
Step 2), taking-up one does not detect in the rule of new opplication, and this applies existing rule in a data network
A carry out step 4) not detected is taken out in then, if all rules have all detected, executes step 3).
Step 3), taking-up one does not detect in the rule of new opplication, and in deployed other application rule
An execution step 4) not detected is taken out, if all rules have all detected, executes step 8).
Two rule spatial domains are denoted as by step 4) respectively: Sn and So, action fields are denoted as: An and Ao, priority are denoted as:
Pn and Po;Then four new regular R in separated space domain and generation1, R2, R3, R4, this four regular spatial domains are respectively as follows: S1
=Sn-So, S2=So-Sn, S3=So ∩ Sn, S4=Sn ∩ So;Action fields are respectively as follows: A1=An, A2=Ao, A3=Ao, A4=
An。
Step 5) detects the content after being spatially separating, if S3And S4It is not null set and A3And A4Corresponding
Movement be it is different, then be judged as isolated point data, execute step 7);Otherwise it is judged as and does not isolate point data, executes step
It is rapid 6).
Step 6) determines whether this step is to jump from step 2), if so, return step 2), otherwise return
Return step 3).
Step 7) eliminates the rule of isolated point data.
The rule of not isolated point data is configured in data network, to dispose new opplication by step 8).
Data scrubbing is carried out after the isolated point data obtained, then after carrying out Data Matching;The data matching method includes:
Step 1: inputting keyword in related application, and carry out that data are fuzzy to be looked into the oracle database of backstage simultaneously
It askes, to search the information to match;
Step 2: if match query, in the search procedure of backstage, if any corresponding data information, then prompted,
And show relevant information, to carry out selection use for user;If there are a plurality of identical data in query process,
Then take the complete data of basic data information.
Step 3: if inquiry mismatches, during background query, such as without corresponding data information, then being mentioned
Show, please re-type and saves data.
Further, fuzzy query method the following steps are included:
Step I, initial and the preservation of isolated point information are edited, part isolated point information corresponds to two or more lead-ins
It is female.
Step II, the mapping relations between isolated point information and initial are established.
Step III, database table structure is established according to search field.
Step IV, when user edits information and saves, the field for including with the information is obtained according to the mapping relations
Corresponding lead-in superclass, and in the database by the mapping relations record between field and lead-in superclass.
Step V, the initial of user input query field.
Step VI, it is obtained corresponding with the inquiry field according to the mapping relations between the field and lead-in superclass
Isolated point information, and show.
Further, JDBC interface calls in the data for needing to clear up in data source in system, executes data scrubbing.
Data prediction refers to standardized data record format, according to predefined rule, corresponding in data record
Field is converted into same format.
The method that artificial detection exceptional value is imitated using fuzzy set theory is subject to algorithms library, rule base and data scrubbing
The auxiliary of log completes the relevant operation to isolated point.
Further, isolated point method for cleaning specifically includes in the data mining based on artificial intelligence:
Step 1 utilizes data information in search program searching database by data retrieval module.
Step 2, main control module are classified by categorization module using data of the sort program to retrieval.
Step 3 carries out correlation analysis using data of the analysis program to retrieval by correlating module.Pass through
Feature recognition module is identified using data characteristic of the recognizer to retrieval.
Step 4 judges the isolated point of data by isolated point judgment module using determining program according to feature identification.It is logical
It crosses cleaning modul and data isolated point is cleared up using liquidation procedures.
Step 5 stores the data after cleaning using Cloud Server by cloud storage module.
Step 6, by display module using the data information of display display retrieval and to data processed result.
Further, the correlating module analysis method includes:
(1) by analysis program according to the parameter value of independent variable and the parameter value of dependent variable, calculate the independent variable with
Pearson correlation coefficients and Spearman's correlation coefficient between the dependent variable, the independent variable and the dependent variable have pair
It should be related to.
(2) according to the Pearson correlation coefficients and the Spearman's correlation coefficient, the independent variable and institute are determined
The relevant parameter between dependent variable is stated, the relevant parameter between the independent variable and the dependent variable is greater than or equal to the first number
Value, and it is less than or equal to second value, if the Pearson correlation coefficients and the Spearman's correlation coefficient are unequal, institute
Stating the first numerical value is the smaller value in the Pearson correlation coefficients and the Spearman's correlation coefficient, and the second value is
The larger value in the Pearson correlation coefficients and the Spearman's correlation coefficient, if the Pearson correlation coefficients and institute
It is equal to state Spearman's correlation coefficient, first numerical value and the second value are the Pearson correlation coefficients or described
Spearman's correlation coefficient.
Further, described according to the Pearson correlation coefficients and the Spearman's correlation coefficient, determine it is described from
Relevant parameter between variable and the dependent variable, comprising:
The Pearson correlation coefficients are multiplied with the Spearman's correlation coefficient, obtain third value.
The Pearson correlation coefficients are added with the Spearman's correlation coefficient, obtain the 4th numerical value.
By the third value divided by, multiplied by 2, obtaining the 5th numerical value after the 4th numerical value.
Determine that the relevant parameter between the independent variable and the dependent variable is the 5th numerical value.
It is described according to the Pearson correlation coefficients and the Spearman's correlation coefficient, determine the independent variable and institute
State the relevant parameter between dependent variable, comprising:
When the Pearson correlation coefficients and the absolute value of the difference of the Spearman's correlation coefficient are greater than first threshold
When, determine that the relevant parameter between the independent variable and the dependent variable is the second value;
When the Pearson correlation coefficients and the absolute value of the difference of the Spearman's correlation coefficient are less than or equal to institute
When stating first threshold, determine that the relevant parameter between the independent variable and the dependent variable is the 5th numerical value.
The data mining Outlier Detection system based on artificial intelligence that another object of the present invention is to provide a kind of, it is described
Data mining Outlier Detection system based on artificial intelligence includes:
Data retrieval module is connect with main control module, for passing through data information in search program searching database.
Main control module is sentenced with data retrieval module, categorization module, correlating module, feature recognition module, isolated point
Disconnected module, cleaning modul, cloud storage module, display module connection, for controlling the normal work of modules by central processing unit
Make.
Categorization module is connect with main control module, for being classified by data of the sort program to retrieval.
Correlating module is connect with main control module, for carrying out correlation by data of the analysis program to retrieval
Analysis.
Feature recognition module is connect with main control module, for being identified by data characteristic of the recognizer to retrieval.
Isolated point judgment module, connect with main control module, for judging data according to feature identification by determining program
Isolated point.
Cleaning modul is connect with main control module, for being cleared up by liquidation procedures data isolated point.
Cloud storage module, connect with main control module, for being stored by Cloud Server to the data after cleaning.
Display module is connect with main control module, for the data information by display display retrieval and to data processing
As a result.
Advantages of the present invention and good effect are as follows:
The present invention calculates the pearson correlation system between two groups of data of independent variable and dependent variable by correlating module
Several and Spearman's correlation coefficient, then calculated Pearson correlation coefficients and Spearman's correlation coefficient, determine
A new relevant parameter characterizes the correlation between independent variable and dependent variable out, and the value of the relevant parameter is in Pearson's phase
Between relationship number and Spearman's correlation coefficient, by the correlation between relevant parameter characterization independent variable and dependent variable, then from
Pearson correlation coefficients and Spearman's correlation coefficient are selected, even if not knowing that there is analyzed data which kind of association to close
System, can also determine the correlation between data.Meanwhile number can be increased to the wrong data in data source by cleaning modul
The concept of isolated point is introduced, data are utilized the problem of reducing the quality of data, influence data mining effect according to the difficulty of source cleaning
Mistake often shows as the characteristic of isolated point, by the metadata for detecting isolated point and combining domain knowledge or being stored, looks for
Corresponding wrong data and the method removed out, achieve the purpose that data scrubbing, improve the quality of data of data source.
The present invention solve available data excavate Outlier Detection during carry out data dependence analysis when, if there is
There is the uncomprehending situation of which kind of incidence relation to analyzed data, then it can not be accurately from Pearson correlation coefficients or this Pierre
The problem of being selected in graceful related coefficient;Meanwhile solve detect isolated point after can also be in conjunction with domain knowledge or being stored
Metadata, therefrom find out corresponding wrong data.
Isolated point information detector node data is right to isolated point information detector node clustering in the same time for certain phase of the invention
Each cluster after sub-clustering is respectively trained super ellipsoids and accordingly calculates each axial length of super ellipsoids, using axial length proportionality coefficient as coefficient pair
The linear dimensionality reduction of isolated point information data, the data after dimensionality reduction are fitted to data and curves, as test curve;It is identical to subsequent time
The data of period make identical dimensionality reduction, curve fit process, and the curve after fitting is as detection curve;Compare test curve and inspection
The trend and curve similarity of curve are surveyed, the isolated point information data that detection node is collected whether there is abnormal data;It will be present
Abnormal data as isolated point data.
Data scrubbing is carried out after the isolated point data that the present invention further obtains, then after carrying out Data Matching;The data
Matching process includes: step 1: inputting keyword in related application, and carries out data mould in the oracle database of backstage simultaneously
Paste inquiry, to search the information to match;Step 2: if match query, in the search procedure of backstage, if any corresponding number
It is believed that breath, then prompted, and relevant information is shown, to carry out selection use for user;If in query process, out
Existing a plurality of identical data, then take the complete data of basic data information;Step 3: if inquiry mismatches, in background query
In the process, it is such as then prompted without corresponding data information, please be re-type and saves data.It can get accurate data,
It ensure that the safety of data.
The present invention is having new opplication to need to be deployed in data in pretreated data progress Outlier Detection, identification
When in network, application rule is formulated by feature recognition module;Then by rule carry out isolated point Data Detection, to application rule with
Deployed application rule is detected, if not isolating point data, directly disposes new opplication;If there is isolated points
It is eliminated according to by the rule of isolated point data, the priority of rule is obtained according to priority judgment criterion, and disappeared according to priority
Except the rule of isolated point data;The rule for eliminating isolated point data is configured in data network, can get accurately isolated
Point data.
Detailed description of the invention
Fig. 1 is the data mining Outlier Detection method flow diagram provided in an embodiment of the present invention based on artificial intelligence.
Fig. 2 is the data mining Outlier Detection system structure diagram provided in an embodiment of the present invention based on artificial intelligence.
In figure: 1, data retrieval module;2, main control module;3, categorization module;4, correlating module;5, feature identifies
Module;6, isolated point judgment module;7, cleaning modul;8, cloud storage module;9, display module.
Specific embodiment
In order to further understand the content, features and effects of the present invention, the following examples are hereby given, and cooperate attached drawing
Detailed description are as follows.
Available data is excavated during Outlier Detection when needing to carry out data dependence analysis, if there is to dividing
Analysing data has the uncomprehending situation of which kind of incidence relation, then can not be accurately related from Pearson correlation coefficients or Spearman
It is selected in coefficient.
Simultaneously as error in data often shows as isolated point, therefore can by detecting and removing the isolated point in data source
Achieve the purpose that data scrubbing, improves the quality of data of data source;But and not all isolated point is all wrong data, therefore is needed
The metadata how research can also combine domain knowledge or be stored after detecting isolated point, therefrom find out corresponding error number
According to.
To solve the above problems, being explained in detail with reference to the accompanying drawing to the present invention.
As shown in Figure 1, the data mining Outlier Detection method provided by the invention based on artificial intelligence includes following step
It is rapid:
S101 utilizes data information in search program searching database by data retrieval module.
S102, main control module are classified by categorization module using data of the sort program to retrieval.
S103 carries out correlation analysis using data of the analysis program to retrieval by correlating module.Pass through spy
Sign identification module is identified using data characteristic of the recognizer to retrieval.
S104 judges the isolated point of data by isolated point judgment module using determining program according to feature identification.Pass through
Cleaning modul clears up data isolated point using liquidation procedures.
S105 stores the data after cleaning using Cloud Server by cloud storage module.
S106, by display module using the data information of display display retrieval and to data processed result.
As shown in Fig. 2, the data mining Outlier Detection system provided in an embodiment of the present invention based on artificial intelligence includes:
Data retrieval module 1, main control module 2, categorization module 3, correlating module 4, feature recognition module 5, isolated point judge mould
Block 6, cleaning modul 7, cloud storage module 8, display module 9.
Data retrieval module 1 is connect with main control module 2, for passing through data information in search program searching database.
Main control module 2, with data retrieval module 1, categorization module 3, correlating module 4, feature recognition module 5, orphan
Vertical point judgment module 6, cleaning modul 7, cloud storage module 8, display module 9 connect, each for being controlled by central processing unit
Module works normally.
Categorization module 3 is connect with main control module 2, for being classified by data of the sort program to retrieval.
Correlating module 4 is connect with main control module 2, related for being carried out by data of the analysis program to retrieval
Property analysis.
Feature recognition module 5 is connect with main control module 2, for being known by data characteristic of the recognizer to retrieval
Not.
Isolated point judgment module 6 is connect with main control module 2, for judging data according to feature identification by determining program
Isolated point.
Cleaning modul 7 is connect with main control module 2, for being cleared up by liquidation procedures data isolated point.
Cloud storage module 8 is connect with main control module 2, for being stored by Cloud Server to the data after cleaning.
Display module 9 is connect with main control module 2, for by display display retrieval data information and to data at
Manage result.
The invention will be further described combined with specific embodiments below.
Embodiment 1
4 analysis method of correlating module provided by the invention is as follows:
(1) by analysis program according to the parameter value of independent variable and the parameter value of dependent variable, calculate the independent variable with
Pearson correlation coefficients and Spearman's correlation coefficient between the dependent variable, the independent variable and the dependent variable have pair
It should be related to.
(2) according to the Pearson correlation coefficients and the Spearman's correlation coefficient, the independent variable and institute are determined
The relevant parameter between dependent variable is stated, the relevant parameter between the independent variable and the dependent variable is greater than or equal to the first number
Value, and it is less than or equal to second value, if the Pearson correlation coefficients and the Spearman's correlation coefficient are unequal, institute
Stating the first numerical value is the smaller value in the Pearson correlation coefficients and the Spearman's correlation coefficient, and the second value is
The larger value in the Pearson correlation coefficients and the Spearman's correlation coefficient, if the Pearson correlation coefficients and institute
It is equal to state Spearman's correlation coefficient, first numerical value and the second value are the Pearson correlation coefficients or described
Spearman's correlation coefficient.
It is provided by the invention according to the Pearson correlation coefficients and the Spearman's correlation coefficient, determine it is described from
Relevant parameter between variable and the dependent variable, comprising:
The Pearson correlation coefficients are multiplied with the Spearman's correlation coefficient, obtain third value.
The Pearson correlation coefficients are added with the Spearman's correlation coefficient, obtain the 4th numerical value.
By the third value divided by, multiplied by 2, obtaining the 5th numerical value after the 4th numerical value.
Determine that the relevant parameter between the independent variable and the dependent variable is the 5th numerical value.
It is provided by the invention according to the Pearson correlation coefficients and the Spearman's correlation coefficient, determine it is described from
Relevant parameter between variable and the dependent variable, comprising:
When the Pearson correlation coefficients and the absolute value of the difference of the Spearman's correlation coefficient are greater than first threshold
When, determine that the relevant parameter between the independent variable and the dependent variable is the second value.
When the Pearson correlation coefficients and the absolute value of the difference of the Spearman's correlation coefficient are less than or equal to institute
When stating first threshold, determine that the relevant parameter between the independent variable and the dependent variable is the 5th numerical value.
Embodiment 2
7 method for cleaning of cleaning modul provided by the invention is as follows:
1) data for clearance are called in by JDBC interface.
2) data are pre-processed.
3) Outlier Detection, identification and processing are carried out to data.
4) result data is exported to source of new data by JDBC interface.
In step 1) provided by the invention, JDBC is the abbreviation of JavaDataBaseConnectivity, i.e. Java data
Library connection, the interface call in the data for needing to clear up in data source in system, execute data scrubbing.
In step 2) provided by the invention, data prediction refers to standardized data record format, according to predefined rule
Then, the respective field in data record is converted into same format.
In step 3) provided by the invention, the method that artificial detection exceptional value is imitated using fuzzy set theory is calculated
The auxiliary of Faku County, rule base and data scrubbing log completes the relevant operation to isolated point.
Embodiment 3
It is provided in an embodiment of the present invention to be based on isolated point method for cleaning in the data mining of artificial intelligence and include:
According to certain mutually in the same time isolated point information detector node data to isolated point information detector node clustering, to point
Each cluster after cluster is respectively trained super ellipsoids and accordingly calculates each axial length of super ellipsoids, using axial length proportionality coefficient as coefficient to orphan
The point linear dimensionality reduction of information data is found, the data after dimensionality reduction are fitted to data and curves, as test curve.
Identical dimensionality reduction, curve fit process are made to the data of subsequent time same time period, the curve after fitting is as inspection
Survey curve.
Compare the trend and curve similarity of test curve and detection curve, the isolated point information data that detection node is collected
With the presence or absence of abnormal data.The abnormal data that will be present is as isolated point data.
The isolated point data of acquisition is called in as data for clearance by JDBC interface.
The method for obtaining isolated point data specifically includes:
S1: test data is chosen.
S2: node clustering is carried out to the test data of selection.
S3: the super ellipsoids to the cluster training divided just comprising all nodes in cluster, and calculate the axis of corresponding super ellipsoids
It is long.
S4: Data Dimensionality Reduction is carried out according to the axial length of each super ellipsoids.
S5: corresponding curve matching is carried out to the data after the axial length dimensionality reduction according to each super ellipsoids.
S6: detection data is chosen.
S7: processing detection data.
S8: carrying out similarity-rough set for test curve and detection curve, determines data with the presence or absence of abnormal data.
The detailed process of step S2 are as follows:
Data are calculated by the node data of selection to node clustering according to the data of the identical moment point of each node
In the license radius of each dimension.
Judge ri dWithIt is whether adjacent.If adjacent, node i, j belongs to a cluster on dimension direction, only meets section
Point is when all belonging to the same cluster in all k dimensions, title node i, the same cluster of j, meanwhile, if two cluster CiAnd CjCluster sectionWithMeet
When being set up to all k, then cluster CiAnd CjCombinable is a cluster, and cluster radius is CR=[MIN ({ mini,minj}),
MAX({maxi,maxj})]。
The detailed process of step S3 are as follows:
Connection between data attribute described with the proportionate relationship between each axial length of super ellipsoids, super ellipsoids it is each
Axial length is respectively σpl≥σp-1l≥σp-2l≥…≥σ1l.Wherein, σi(1≤i≤p) indicates the covariance matrix Σ's of data set D
The square root of characteristic value, the mean value of data set D is indicated with μ, then corresponds to the axial length of super ellipsoids
The detailed process of step S4 are as follows: calculate the corresponding proportionality coefficient a of each axial length of super ellipsoidsiAnd as linear drop
The coefficient d of dimension, i.e.,
The detailed process of step S5 are as follows: carry out curve fitting to the data after dimensionality reduction in two-dimensional surface.Ten groups of data fittings
At eight smooth nonlinear function curves and its starting point is moved to origin, the curve after translation is as test curve f
(x)。
The detailed process of S7 are as follows: according to the step S4 and S5 method to the test data of selection carry out Data Dimensionality Reduction and
Curve matching obtains detection curve g (x).
Step S8 needs to determine exceptional value, detailed process by judging the similarity degree of two curves are as follows:
If f (x) is the test curve of fitting, g (x) is the curve to be detected of fitting, for preset threshold value c (0 <
C < 1), when curve f (x) and curve g (x) satisfaction, to arbitrary x ∈ X, have
| f (x)-g (x) | < c
Or meet
Then claim to be no different constant value presence at the node, otherwise it is assumed that there are exceptional values.
Data scrubbing is carried out after the isolated point data obtained, then after carrying out Data Matching.The data matching method includes:
Step 1: inputting keyword in related application, and carry out that data are fuzzy to be looked into the oracle database of backstage simultaneously
It askes, to search the information to match.
Step 2: if match query, in the search procedure of backstage, if any corresponding data information, then prompted,
And show relevant information, to carry out selection use for user.If there are a plurality of identical data in query process,
Then take the complete data of basic data information.
Step 3: if inquiry mismatches, during background query, such as without corresponding data information, then being mentioned
Show, please re-type and saves data.
Embodiment 4
It is provided in an embodiment of the present invention to include: to pretreated data progress Outlier Detection, knowledge method for distinguishing
When there is new opplication to need to dispose in a data network, application rule is formulated by feature recognition module;It then will rule
Isolated point Data Detection is then carried out, it is regular to application to be detected with deployed application rule, if not isolated points
According to, directly deployment new opplication;The rule of isolated point data is eliminated if there is isolated point data, according to priority judgment criterion
The priority of rule is obtained, and eliminates the rule of isolated point data according to priority;The rule of isolated point data will be eliminated
It is configured in data network.
It specifically includes:
Step 1), when there is new opplication request, the rule that application is generated carries out data model conversion, i.e., by regular partition
For spatial domain S and action fields A;Then rule is forwarded to isolated point data detection module, and judges that new opplication itself rule is
It is no to belong to the application type that can produce isolated point data, it is no to then follow the steps 3) if it is execution step 2).
Step 2), taking-up one does not detect in the rule of new opplication, and this applies existing rule in a data network
A carry out step 4) not detected is taken out in then, if all rules have all detected, executes step 3).
Step 3), taking-up one does not detect in the rule of new opplication, and in deployed other application rule
An execution step 4) not detected is taken out, if all rules have all detected, executes step 8).
Two rule spatial domains are denoted as by step 4) respectively: Sn and So, action fields are denoted as: An and Ao, priority are denoted as:
Pn and Po;Then four new regular R in separated space domain and generation1, R2, R3, R4, this four regular spatial domains are respectively as follows: S1
=Sn-So, S2=So-Sn, S3=So ∩ Sn, S4=Sn ∩ So;Action fields are respectively as follows: A1=An, A2=Ao, A3=Ao, A4=
An。
Step 5) detects the content after being spatially separating, if S3And S4It is not null set and A3And A4Corresponding
Movement be it is different, then be judged as isolated point data, execute step 7);Otherwise it is judged as and does not isolate point data, executes step
It is rapid 6).
Step 6) determines whether this step is to jump from step 2), if so, return step 2), otherwise return
Return step 3).
Step 7) eliminates the rule of isolated point data.
The rule of not isolated point data is configured in data network, to dispose new opplication by step 8).
The above is only the preferred embodiments of the present invention, and is not intended to limit the present invention in any form,
Any simple modification made to the above embodiment according to the technical essence of the invention, equivalent variations and modification, belong to
In the range of technical solution of the present invention.
Claims (10)
1. isolated point method for cleaning in a kind of data mining based on artificial intelligence, which is characterized in that described to be based on artificial intelligence
Data mining in isolated point method for cleaning include:
According to certain, mutually isolated point information detector node data is to isolated point information detector node clustering in the same time, after sub-clustering
Each cluster be respectively trained super ellipsoids and accordingly calculate each axial length of super ellipsoids, using axial length proportionality coefficient as coefficient to isolated point
The linear dimensionality reduction of information data, the data after dimensionality reduction are fitted to data and curves, as test curve;
Identical dimensionality reduction, curve fit process are made to the data of subsequent time same time period, the curve after fitting is bent as detection
Line;
Compare the trend and curve similarity of test curve and detection curve, whether is the isolated point information data that detection node is collected
There are abnormal datas;The abnormal data that will be present is as isolated point data;
The isolated point data of acquisition is called in as data for clearance by JDBC interface;
Data for clearance are pre-processed;
Outlier Detection, identification are carried out to pretreated data;
Pass through JDBC interface export treated result data to source of new data.
2. isolated point method for cleaning in the data mining based on artificial intelligence as described in claim 1, which is characterized in that obtain
The method of isolated point data specifically includes:
S1: test data is chosen;
S2: node clustering is carried out to the test data of selection;
S3: the super ellipsoids to the cluster training divided just comprising all nodes in cluster, and calculate the axial length of corresponding super ellipsoids;
S4: Data Dimensionality Reduction is carried out according to the axial length of each super ellipsoids;
S5: corresponding curve matching is carried out to the data after the axial length dimensionality reduction according to each super ellipsoids;
S6: detection data is chosen;
S7: processing detection data;
S8: carrying out similarity-rough set for test curve and detection curve, determines data with the presence or absence of abnormal data.
3. isolated point method for cleaning in the data mining based on artificial intelligence as claimed in claim 2, which is characterized in that step
The detailed process of S5 are as follows: carry out curve fitting to the data after dimensionality reduction in two-dimensional surface;Ten groups of data are fitted to eight light
Its starting point is simultaneously moved to origin by sliding nonlinear function curve, and the curve after translation is as test curve f (x);
The detailed process of step S7 are as follows: according to the step S4 and S5 method to the test data of selection carry out Data Dimensionality Reduction and
Curve matching obtains detection curve g (x);
Step S8 needs to determine exceptional value, detailed process by judging the similarity degree of two curves are as follows:
If f (x) is the test curve of fitting, g (x) is the curve to be detected of fitting, for preset threshold value c (0 < c <
1), when curve f (x) and curve g (x) satisfaction, to arbitrary x ∈ X, have
| f (x)-g (x) | < c
Or meet
Then claim to be no different constant value presence at the node, otherwise it is assumed that there are exceptional values.
4. isolated point method for cleaning in the data mining based on artificial intelligence as described in claim 1, which is characterized in that pre-
Data that treated carry out Outlier Detection, know method for distinguishing
When there is new opplication to need to dispose in a data network, application rule is formulated by feature recognition module;Then by rule into
Row isolated point Data Detection, it is regular to application to be detected with deployed application rule, if not isolating point data, directly
Socket part affixes one's name to new opplication;The rule of isolated point data is eliminated if there is isolated point data, is obtained according to priority judgment criterion
The priority of rule, and eliminate according to priority the rule of isolated point data;The rule configuration of isolated point data will be eliminated
Into data network;
It specifically includes:
Step 1, when there is new opplication request, the rule that application is generated carries out data model conversion, i.e., is sky by regular partition
Between domain S and action fields A;Then rule is forwarded to isolated point data detection module, and judges whether new opplication itself rule belongs to
It is no to then follow the steps 3 if it is execution step 2 in the application type that can produce isolated point data;
Step 2, one is taken out in the rule of new opplication not detect, and in a data network this using being taken in existing rule
A carry out step 4 not detected out executes step 3 if all rules have all detected;
Step 3, one is taken out in the rule of new opplication not detect, and take out one in deployed other application rule
The execution step 4 that item does not detect executes step 8 if all rules have all detected;
Step 4, two rule spatial domains are denoted as respectively: Sn and So, action fields are denoted as: An and Ao, priority are denoted as: Pn and
Po;Then four new regular R in separated space domain and generation1, R2, R3, R4, this four regular spatial domains are respectively as follows: S1=Sn-
So, S2=So-Sn, S3=So ∩ Sn, S4=Sn ∩ So;Action fields are respectively as follows: A1=An, A2=Ao, A3=Ao, A4=An;
Step 5, the content after being spatially separating is detected, if S3And S4It is not null set and A3And A4Corresponding movement is
It is different, then it is judged as isolated point data, executes step 7;Otherwise it is judged as and does not isolate point data, executes step 6;
Step 6, determine whether this step is to jump from step 2, if so, return step 2, otherwise return step 3;
Step 7, the rule of isolated point data is eliminated;
Step 8, the rule of not isolated point data is configured in data network, to dispose new opplication;
Data scrubbing is carried out after the isolated point data obtained, then after carrying out Data Matching;The data matching method includes:
Step 1: inputting keyword in related application, and carry out data fuzzy query in the oracle database of backstage simultaneously, come
Search the information to match;
Step 2: if match query, in the search procedure of backstage, if any corresponding data information, then being prompted, and will
Relevant information is shown, to carry out selection use for user;If occurring a plurality of identical data in query process, then taking
The complete data of basic data information;
Step 3: if inquiry mismatches, during background query, such as without corresponding data information, then being prompted, asked
It re-types and saves data.
5. isolated point method for cleaning in the data mining based on artificial intelligence as claimed in claim 4, which is characterized in that fuzzy
The method of inquiry the following steps are included:
Step I, initial and the preservation of isolated point information are edited, part isolated point information corresponds to two or more initials;
Step II, the mapping relations between isolated point information and initial are established;
Step III, database table structure is established according to search field;
Step IV, it when user edits information and saves, is obtained corresponding with the field that the information includes according to the mapping relations
Lead-in superclass, and by between field and lead-in superclass mapping relations record in the database;
Step V, the initial of user input query field;
Step VI, it is obtained according to the mapping relations between the field and lead-in superclass corresponding with the inquiry field isolated
Point information, and show.
6. isolated point method for cleaning in the data mining based on artificial intelligence as described in claim 1, which is characterized in that JDBC
Interface calls in the data for needing to clear up in data source in system, executes data scrubbing;
Data prediction refers to standardized data record format, according to predefined rule, the respective field in data record
It is converted into same format;
The method that artificial detection exceptional value is imitated using fuzzy set theory is subject to algorithms library, rule base and data scrubbing log
Auxiliary, complete relevant operation to isolated point.
7. isolated point method for cleaning in the data mining based on artificial intelligence as described in claim 1, which is characterized in that described
Isolated point method for cleaning specifically includes in data mining based on artificial intelligence:
Step 1 utilizes data information in search program searching database by data retrieval module;
Step 2, main control module are classified by categorization module using data of the sort program to retrieval;
Step 3 carries out correlation analysis using data of the analysis program to retrieval by correlating module;Pass through feature
Identification module is identified using data characteristic of the recognizer to retrieval;
Step 4 judges the isolated point of data by isolated point judgment module using determining program according to feature identification;By clear
Reason module clears up data isolated point using liquidation procedures;
Step 5 stores the data after cleaning using Cloud Server by cloud storage module;
Step 6, by display module using the data information of display display retrieval and to data processed result.
8. isolated point method for cleaning in the data mining based on artificial intelligence as claimed in claim 7, which is characterized in that described
Correlating module analysis method includes:
(1) by analysis program according to the parameter value of independent variable and the parameter value of dependent variable, calculate the independent variable with it is described
Pearson correlation coefficients and Spearman's correlation coefficient between dependent variable, the independent variable have corresponding close with the dependent variable
System;
(2) according to the Pearson correlation coefficients and the Spearman's correlation coefficient, determine the independent variable and it is described because
Relevant parameter between variable, the relevant parameter between the independent variable and the dependent variable are greater than or equal to the first numerical value, and
Less than or equal to second value, if the Pearson correlation coefficients and the Spearman's correlation coefficient are unequal, described
One numerical value is the smaller value in the Pearson correlation coefficients and the Spearman's correlation coefficient, and the second value is described
The larger value in Pearson correlation coefficients and the Spearman's correlation coefficient, if the Pearson correlation coefficients and it is described this
Joseph Pearman related coefficient is equal, and first numerical value and the second value are the Pearson correlation coefficients or this described skin
Germania related coefficient.
9. isolated point method for cleaning in the data mining based on artificial intelligence as claimed in claim 8, which is characterized in that described
According to the Pearson correlation coefficients and the Spearman's correlation coefficient, determine between the independent variable and the dependent variable
Relevant parameter, comprising:
The Pearson correlation coefficients are multiplied with the Spearman's correlation coefficient, obtain third value;
The Pearson correlation coefficients are added with the Spearman's correlation coefficient, obtain the 4th numerical value;
By the third value divided by, multiplied by 2, obtaining the 5th numerical value after the 4th numerical value;
Determine that the relevant parameter between the independent variable and the dependent variable is the 5th numerical value;
It is described according to the Pearson correlation coefficients and the Spearman's correlation coefficient, determine the independent variable and it is described because
Relevant parameter between variable, comprising:
When the Pearson correlation coefficients and the absolute value of the difference of the Spearman's correlation coefficient are greater than first threshold, really
Relevant parameter between the fixed independent variable and the dependent variable is the second value;
When the absolute value of the Pearson correlation coefficients and the difference of the Spearman's correlation coefficient is less than or equal to described the
When one threshold value, determine that the relevant parameter between the independent variable and the dependent variable is the 5th numerical value.
10. it is a kind of implement described in claim 1 isolated point method for cleaning in the data mining based on artificial intelligence based on artificial
The data mining Outlier Detection system of intelligence, which is characterized in that the data mining Outlier Detection based on artificial intelligence
System includes:
Data retrieval module is connect with main control module, for passing through data information in search program searching database;
Main control module judges mould with data retrieval module, categorization module, correlating module, feature recognition module, isolated point
Block, cleaning modul, cloud storage module, display module connection, work normally for controlling modules by central processing unit;
Categorization module is connect with main control module, for being classified by data of the sort program to retrieval;
Correlating module is connect with main control module, for carrying out correlation analysis by data of the analysis program to retrieval;
Feature recognition module is connect with main control module, for being identified by data characteristic of the recognizer to retrieval;
Isolated point judgment module, connect with main control module, for judging the isolated of data according to feature identification by determining program
Point;
Cleaning modul is connect with main control module, for being cleared up by liquidation procedures data isolated point;
Cloud storage module, connect with main control module, for being stored by Cloud Server to the data after cleaning;
Display module is connect with main control module, for the data information by display display retrieval and to data processed result.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910740294.4A CN110457370A (en) | 2019-08-12 | 2019-08-12 | Outlier Detection system and method for cleaning in data mining based on artificial intelligence |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910740294.4A CN110457370A (en) | 2019-08-12 | 2019-08-12 | Outlier Detection system and method for cleaning in data mining based on artificial intelligence |
Publications (1)
Publication Number | Publication Date |
---|---|
CN110457370A true CN110457370A (en) | 2019-11-15 |
Family
ID=68485907
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910740294.4A Withdrawn CN110457370A (en) | 2019-08-12 | 2019-08-12 | Outlier Detection system and method for cleaning in data mining based on artificial intelligence |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN110457370A (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111932465A (en) * | 2020-06-22 | 2020-11-13 | 杭州思看科技有限公司 | Real-time isolated point removing method and device for three-dimensional scanner |
CN112799897A (en) * | 2021-02-23 | 2021-05-14 | 青岛海科虚拟现实研究院 | Information management method, management system and storage medium based on big data |
Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2002354253A (en) * | 2001-05-25 | 2002-12-06 | Minolta Co Ltd | Image processor and its program |
JP2015088155A (en) * | 2013-09-25 | 2015-05-07 | 日本電信電話株式会社 | Data analyzer, data analysis method, and program |
CN105045937A (en) * | 2015-09-17 | 2015-11-11 | 国网天津市电力公司 | Data redundancy energy efficiency detection method |
CN105307200A (en) * | 2015-09-30 | 2016-02-03 | 西安电子科技大学 | Method for detecting abnormal value of multidimensional data of wireless sensor network based on trajectory |
CN106656591A (en) * | 2016-12-15 | 2017-05-10 | 西安电子科技大学 | Method for detecting and eliminating rule conflicts among multiple applications in software-defined network |
CN109346168A (en) * | 2018-08-31 | 2019-02-15 | 东软集团股份有限公司 | A kind of method and device of determining data dependence |
CN109947747A (en) * | 2017-12-01 | 2019-06-28 | 广州明领基因科技有限公司 | Big data exceptional value method for cleaning based on Outlier Detection |
-
2019
- 2019-08-12 CN CN201910740294.4A patent/CN110457370A/en not_active Withdrawn
Patent Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2002354253A (en) * | 2001-05-25 | 2002-12-06 | Minolta Co Ltd | Image processor and its program |
JP2015088155A (en) * | 2013-09-25 | 2015-05-07 | 日本電信電話株式会社 | Data analyzer, data analysis method, and program |
CN105045937A (en) * | 2015-09-17 | 2015-11-11 | 国网天津市电力公司 | Data redundancy energy efficiency detection method |
CN105307200A (en) * | 2015-09-30 | 2016-02-03 | 西安电子科技大学 | Method for detecting abnormal value of multidimensional data of wireless sensor network based on trajectory |
CN106656591A (en) * | 2016-12-15 | 2017-05-10 | 西安电子科技大学 | Method for detecting and eliminating rule conflicts among multiple applications in software-defined network |
CN109947747A (en) * | 2017-12-01 | 2019-06-28 | 广州明领基因科技有限公司 | Big data exceptional value method for cleaning based on Outlier Detection |
CN109346168A (en) * | 2018-08-31 | 2019-02-15 | 东软集团股份有限公司 | A kind of method and device of determining data dependence |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111932465A (en) * | 2020-06-22 | 2020-11-13 | 杭州思看科技有限公司 | Real-time isolated point removing method and device for three-dimensional scanner |
CN112799897A (en) * | 2021-02-23 | 2021-05-14 | 青岛海科虚拟现实研究院 | Information management method, management system and storage medium based on big data |
CN112799897B (en) * | 2021-02-23 | 2022-11-01 | 青岛海科虚拟现实研究院 | Information management method, management system and storage medium based on big data |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Amini et al. | On density-based data streams clustering algorithms: A survey | |
Chen et al. | A comparison of outlier detection algorithms for ITS data | |
Ester et al. | A density-based algorithm for discovering clusters in large spatial databases with noise | |
CN111221920B (en) | Case base construction method and device for power transformation equipment operation and maintenance device and computer storage medium | |
CN105389326B (en) | Image labeling method based on weak matching probability typical relevancy models | |
CN103281341A (en) | Network event processing method and device | |
Taghiyarrenani et al. | Transfer learning based intrusion detection | |
CN110457405A (en) | A kind of database audit method based on genetic connection | |
Degirmenci et al. | Robust incremental outlier detection approach based on a new metric in data streams | |
Kumar et al. | Dimensionality reduction based on shap analysis: a simple and trustworthy approach | |
CN110457370A (en) | Outlier Detection system and method for cleaning in data mining based on artificial intelligence | |
CN108920953A (en) | A kind of malware detection method and system | |
Saravanan et al. | Video image retrieval using data mining techniques | |
CN113434418A (en) | Knowledge-driven software defect detection and analysis method and system | |
Cateni | Improving the stability of wrapper variable selection applied to binary classification | |
CN110378119A (en) | A kind of malware detection method and system | |
Wu et al. | Extracting knowledge from web tables based on DOM tree similarity | |
CN110716957A (en) | Intelligent mining and analyzing method for class case suspicious objects | |
CN110502669A (en) | The unsupervised chart dendrography learning method of lightweight and device based on the side N DFS subgraph | |
Dongare et al. | A feature selection approach for enhancing the cardiotocography classification performance | |
CN109189908B (en) | Mass data extracts push working method | |
CN111582391B (en) | Three-dimensional point cloud outlier detection method and device based on modular design | |
Abdullah et al. | Efficient fuzzy techniques for medical data clustering | |
Anh et al. | A new cbir system using sift combined with neural network and graph-based segmentation | |
Elnekave et al. | Discovering regular groups of mobile objects using incremental clustering |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
WW01 | Invention patent application withdrawn after publication | ||
WW01 | Invention patent application withdrawn after publication |
Application publication date: 20191115 |