CN111241674A - Data classification method and system based on mathematical modeling - Google Patents
Data classification method and system based on mathematical modeling Download PDFInfo
- Publication number
- CN111241674A CN111241674A CN202010015900.9A CN202010015900A CN111241674A CN 111241674 A CN111241674 A CN 111241674A CN 202010015900 A CN202010015900 A CN 202010015900A CN 111241674 A CN111241674 A CN 111241674A
- Authority
- CN
- China
- Prior art keywords
- data
- classification
- interval
- coding
- attributes
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 36
- 238000001514 detection method Methods 0.000 claims abstract description 47
- 238000013178 mathematical model Methods 0.000 claims abstract description 11
- 238000012163 sequencing technique Methods 0.000 claims description 11
- 238000004590 computer program Methods 0.000 claims description 6
- 238000010606 normalization Methods 0.000 description 4
- 238000004458 analytical method Methods 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000003672 processing method Methods 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
Landscapes
- Engineering & Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Theoretical Computer Science (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Artificial Intelligence (AREA)
- Evolutionary Biology (AREA)
- Evolutionary Computation (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention relates to a data classification method and a data classification system based on mathematical modeling, wherein the method comprises the following steps: acquiring historical data, and coding each historical data to obtain a coding value of each historical data; sorting the coded values of the historical data according to the sizes; dividing the sequenced codes into a plurality of continuous intervals, wherein the data volume of each interval is the same, and each interval corresponds to one data classification to obtain a mathematical model of the data classification; acquiring detection data and coding the detection data to obtain a coding value of the detection data; judging the interval where the detection data code value is located, and dividing the data corresponding to the interval into data categories which serve as the detection data; the method for coding the data comprises the following steps: sorting the attributes of the data according to a set sequence, and assigning values to the attributes according to the types of the attributes; resulting in an encoded value of the data. The technical scheme provided by the invention can solve the problem that the classification method in the prior art is low in working efficiency when searching data.
Description
Technical Field
The invention belongs to the technical field of data classification, and particularly relates to a data classification method and system based on mathematical modeling.
Background
With the progress of science and technology, various industries gradually develop towards informatization and datamation, and data is more and more important. The data usually contains a large amount of information, and by reading the data, people can be helped to find a method for solving problems.
Data classification is to merge data having a common attribute or characteristic, and to distinguish the data by the attribute or characteristic of its category. In order to realize data sharing and improve processing efficiency, it is necessary to follow the agreed classification principle and method, and divide all the information in the system into different sets according to a certain structural system according to the content, property and management requirement of the information, so that each information has a corresponding position in the corresponding classification system. In other words, the information with the same content and the same property and the information requiring the uniform management are gathered together, the different information and the information needing the separate management are distinguished, and then the relationship among the sets is determined, so that an organized classification system is formed.
The basis of data classification research analysis is data, and the types of data can be classified into continuous variables and classification variables. The currently common data classification method is to merge data having a certain common attribute or characteristic together, and distinguish the data by the attribute or characteristic of its category.
The purpose of data classification is to facilitate the search, statistics, and analysis of data to obtain information needed to solve problems. However, it is not convenient to search data only by simply classifying the data according to its attributes, for example, a certain data has multiple attributes, and when searching the data, it is necessary to search its attributes one by one to obtain the search result.
Disclosure of Invention
The invention aims to provide a data classification method and a data classification system based on mathematical modeling, which aim to solve the problem that the data classification method in the prior art is low in working efficiency when searching data.
In order to achieve the purpose, the invention adopts the following technical scheme:
a data classification method based on mathematical modeling comprises the following steps:
(1) acquiring historical data, and coding each historical data to obtain a coding value of each historical data;
sorting the coded values of the historical data according to the sizes;
dividing the sequenced codes into a plurality of continuous intervals, wherein the data volume of each interval is the same, and each interval corresponds to one data classification to obtain a mathematical model of the data classification;
(2) acquiring detection data and coding the detection data to obtain a coding value of the detection data;
(3) judging the interval where the detection data code value is located, and dividing the data corresponding to the interval into data categories which serve as the detection data;
the method for coding the data comprises the following steps:
acquiring the attribute and the type of the attribute of each datum according to the historical data;
sorting the attributes of the data according to a set sequence, and assigning values to the attributes according to the types of the attributes;
and sequencing the assignment of each attribute according to the sequencing of the attributes to obtain the coding value of the data.
Further, the setting sequence is determined according to the influence degree of the attributes on the data.
Further, after the data are divided into a plurality of intervals, the importance degree of the corresponding data classification is obtained according to the value range of each interval; and after the data classification of the detection data is obtained, obtaining the importance degree according to the importance degree of the data classification.
Further, after the coded value of the data is obtained, normalization processing is performed on the coded value.
Further, in the step (3), a bisection method is adopted to determine the section where the detection data is located.
A data classification system based on data modeling, comprising a processor and a memory, the memory having stored thereon a computer program for execution on the processor; when the processor executes the computer program, the following control steps are realized:
(1) acquiring historical data, and coding each historical data to obtain a coding value of each historical data;
sorting the coded values of the historical data according to the sizes;
dividing the sequenced codes into a plurality of continuous intervals, wherein the data volume of each interval is the same, and each interval corresponds to one data classification to obtain a mathematical model of the data classification;
(2) acquiring detection data and coding the detection data to obtain a coding value of the detection data;
(3) judging the interval where the detection data code value is located, and dividing the data corresponding to the interval into data categories which serve as the detection data;
the method for coding the data comprises the following steps:
acquiring the attribute and the type of the attribute of each datum according to the historical data;
sorting the attributes of the data according to a set sequence, and assigning values to the attributes according to the types of the attributes;
and sequencing the assignment of each attribute according to the sequencing of the attributes to obtain the coding value of the data.
Further, the setting sequence is determined according to the influence degree of the attributes on the data.
Further, after the data are divided into a plurality of intervals, the importance degree of the corresponding data classification is obtained according to the value range of each interval; and after the data classification of the detection data is obtained, obtaining the importance degree according to the importance degree of the data classification.
Further, after the coded value of the data is obtained, normalization processing is performed on the coded value.
Further, in the step (3), a bisection method is adopted to determine the section where the detection data is located.
The invention has the beneficial effects that: according to the technical scheme provided by the invention, a mathematical model for data classification is established according to the attributes of historical data, and then the detection data is uniformly classified by combining the codes of the detection data and the established mathematical model. After the scheme provided by the invention is adopted to classify the data, the corresponding data can be quickly inquired during inquiry, and the problem of low working efficiency of the classification method in the prior art during data searching can be solved.
Drawings
FIG. 1 is a flow chart of a data classification method based on mathematical modeling in an embodiment of the method of the present invention.
Detailed Description
The method comprises the following steps:
the invention provides a data classification method based on mathematical modeling, which is used for classifying data so as to solve the problem of low working efficiency when the data classification method in the prior art is used for searching data.
The flow of the data classification method based on mathematical modeling provided by this embodiment is shown in fig. 1, and includes the following steps:
(1) and acquiring historical data, and establishing a data classification mathematical model according to the historical data.
The method for establishing the mathematical model of data classification according to the historical data comprises the following steps:
encoding the historical data to obtain the encoded value of each historical data;
sorting the coded values of the historical data according to the sizes;
and dividing the sorted historical data into a plurality of continuous intervals according to the encoding values, wherein the data amount in each interval is the same, and each interval corresponds to one data classification to obtain a mathematical model for data classification.
If it is assumed that the historical data has ten thousand pieces of data and needs to be divided into ten data categories, that is, the historical data needs to be divided into ten intervals, the coded values of the historical data are obtained first, then the historical data are sorted according to the sequence of the coded values from small to large, then the historical data are divided into ten continuous intervals according to the size of the coded values, each interval comprises one thousand pieces of historical data, and the maximum value and the minimum value of each interval are obtained.
(2) And acquiring detection data, and encoding the detection data to obtain an encoding value of the detection data.
(3) And combining the established mathematical model and the code value of the detection data to obtain the data classification of the detection data.
The method for obtaining the classification of the detection data comprises the following steps: and judging the section where the detection data code value is positioned, and taking the classification corresponding to the section as the data classification of the detection data.
The section where the detected data code value is located means that the detected data code value is greater than the minimum code value of the section and less than the maximum code value of the section.
The method for encoding the data is as follows:
firstly, determining the attribute quantity of data;
each data has multiple attributes and is obtained by analyzing historical data; in this embodiment, five attributes of data are taken as an example, and are attribute 1, attribute 2, attribute 3, attribute 4 and attribute 5 respectively;
then determining the type number of various attributes according to historical data, and coding the attributes of various types;
the types of the attributes are obtained by analyzing and summarizing historical data, for example, for the attribute 1, three types are totally included in the historical data, namely, the type 1, the type 2 and the type 3, and the codes of the attribute 1 are 00, 01, 02 and 03;
and finally, sequencing the attribute types, and obtaining the data codes according to the codes of the attribute types.
When the attribute types are sequenced, the importance degree of each type of attribute is firstly obtained, and the importance degree is determined by a worker according to the influence degree of the data; acquiring the priority of the attributes according to the importance degrees of the attributes, wherein the higher the importance degree of the attributes is, the higher the priority is; then, obtaining the sequence of each type of attribute according to the sequence of the priorities from big to small; and finally, sequencing the coded values of the attributes according to the sequencing sequence of the corresponding attribute types to obtain the codes of the data.
If the priority of the attribute of one of the data is attribute 1, attribute 2, attribute 3, attribute 4 and attribute 5 in turn, and the coded values of the attribute 1, attribute 2, attribute 3, attribute 4 and attribute 5 of the data are 01, 02, 03, 00 and 03 respectively, the coded value of the data is 0102030003.
Further, in the step (3), the section where the detection data is located is found by adopting a bisection method.
Furthermore, in order to reduce the workload when querying the section where the detection data is located, after obtaining the encoded value of the data, normalization processing is performed on the encoded value, where the processing method is as follows:
setting the code value of the obtained data as n, wherein in the historical data, the maximum code value and the minimum code value of the data are Nmax and Nmin respectively, and then the code value of the data after normalization processing is:
m=(n-Nmin)/(Nmax-Nmin)
further, the data density of each interval is obtained according to the difference between the maximum value and the minimum value of the interval, and the importance degree of the corresponding data classification is obtained according to the data density of each interval, namely the larger the data density of the interval is, the larger the importance degree of the corresponding data classification is. And after judging the data classification to which the detection data belongs, taking the importance degree of the data classification as the importance degree of the data classification, thereby obtaining the importance degree of each detection data.
The embodiment of the system is as follows:
the embodiment provides a data classification system based on mathematical modeling, which comprises a processor and a memory, wherein the memory is used for storing a computer program executed by the processor, and when the processor executes the computer program, the data classification system based on mathematical modeling provided by the method embodiment is realized.
The embodiments of the present invention disclosed above are intended merely to help clarify the technical solutions of the present invention, and it is not intended to describe all the details of the invention nor to limit the invention to the specific embodiments described. Obviously, many modifications and variations are possible in light of the above teaching. The embodiments were chosen and described in order to best explain the principles of the invention and the practical application, to thereby enable others skilled in the art to best utilize the invention. The invention is limited only by the claims and their full scope and equivalents.
Those of ordinary skill in the art will understand that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; such modifications and substitutions do not depart from the spirit and scope of the corresponding technical solutions.
Claims (10)
1. A data classification method based on mathematical modeling is characterized by comprising the following steps:
(1) acquiring historical data, and coding each historical data to obtain a coding value of each historical data;
sorting the coded values of the historical data according to the sizes;
dividing the sequenced codes into a plurality of continuous intervals, wherein the data volume of each interval is the same, and each interval corresponds to one data classification to obtain a mathematical model of the data classification;
(2) acquiring detection data and coding the detection data to obtain a coding value of the detection data;
(3) judging the interval where the detection data code value is located, and dividing the data corresponding to the interval into data categories which serve as the detection data;
the method for coding the data comprises the following steps:
acquiring the attribute and the type of the attribute of each datum according to the historical data;
sorting the attributes of the data according to a set sequence, and assigning values to the attributes according to the types of the attributes;
and sequencing the assignment of each attribute according to the sequencing of the attributes to obtain the coding value of the data.
2. The mathematical modeling based data classification method according to claim 1, wherein the setting order is determined according to the degree of influence of the attribute on the data.
3. The data classification method based on mathematical modeling according to claim 1, characterized in that after the data is divided into a plurality of intervals, the importance degree of the corresponding data classification is obtained according to the value range of each interval; and after the data classification of the detection data is obtained, obtaining the importance degree according to the importance degree of the data classification.
4. The data classification method based on data modeling according to claim 1, characterized in that after the encoded values of the data are obtained, they are normalized.
5. The data classification method based on data modeling according to claim 1, wherein the interval in which the detection data is located is determined by using a dichotomy in the step (3).
6. A data classification system based on data modeling, comprising a processor and a memory, the memory having stored thereon a computer program for execution on the processor; when the processor executes the computer program, the following control steps are realized:
(1) acquiring historical data, and coding each historical data to obtain a coding value of each historical data;
sorting the coded values of the historical data according to the sizes;
dividing the sequenced codes into a plurality of continuous intervals, wherein the data volume of each interval is the same, and each interval corresponds to one data classification to obtain a mathematical model of the data classification;
(2) acquiring detection data and coding the detection data to obtain a coding value of the detection data;
(3) judging the interval where the detection data code value is located, and dividing the data corresponding to the interval into data categories which serve as the detection data;
the method for coding the data comprises the following steps:
acquiring the attribute and the type of the attribute of each datum according to the historical data;
sorting the attributes of the data according to a set sequence, and assigning values to the attributes according to the types of the attributes;
and sequencing the assignment of each attribute according to the sequencing of the attributes to obtain the coding value of the data.
7. The data classification system based on data modeling according to claim 6, wherein the setting order is determined according to the degree of influence of the attributes on the data.
8. The data classification system based on data modeling according to claim 6, characterized in that after the data is divided into a plurality of intervals, the importance degree of the corresponding data classification is obtained according to the value range of each interval; and after the data classification of the detection data is obtained, obtaining the importance degree according to the importance degree of the data classification.
9. The data classification system based on data modeling according to claim 6, characterized in that after the encoded values of the data are obtained, they are normalized.
10. The data classification system based on data modeling according to claim 6, characterized in that the interval where the detection data is located is judged by using a dichotomy in the step (3).
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010015900.9A CN111241674A (en) | 2020-01-08 | 2020-01-08 | Data classification method and system based on mathematical modeling |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010015900.9A CN111241674A (en) | 2020-01-08 | 2020-01-08 | Data classification method and system based on mathematical modeling |
Publications (1)
Publication Number | Publication Date |
---|---|
CN111241674A true CN111241674A (en) | 2020-06-05 |
Family
ID=70865932
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202010015900.9A Pending CN111241674A (en) | 2020-01-08 | 2020-01-08 | Data classification method and system based on mathematical modeling |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN111241674A (en) |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20150278371A1 (en) * | 2014-04-01 | 2015-10-01 | Tableau Software, Inc. | Systems and Methods for Ranking Data Visualizations |
CN108391129A (en) * | 2018-04-25 | 2018-08-10 | 西安万像电子科技有限公司 | Data-encoding scheme and device |
WO2019047790A1 (en) * | 2017-09-08 | 2019-03-14 | 第四范式(北京)技术有限公司 | Method and system for generating combined features of machine learning samples |
US10410140B1 (en) * | 2016-08-17 | 2019-09-10 | Amazon Technologies, Inc. | Categorical to numeric conversion of features for machine learning models |
US20190325339A1 (en) * | 2018-04-22 | 2019-10-24 | Trendalyze Inc. | Method for converting nominal to ordinal or continuous variables using time-series distances |
CN110471948A (en) * | 2019-07-10 | 2019-11-19 | 北京交通大学 | A kind of customs declaration commodity classifying intelligently method excavated based on historical data |
-
2020
- 2020-01-08 CN CN202010015900.9A patent/CN111241674A/en active Pending
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20150278371A1 (en) * | 2014-04-01 | 2015-10-01 | Tableau Software, Inc. | Systems and Methods for Ranking Data Visualizations |
US10410140B1 (en) * | 2016-08-17 | 2019-09-10 | Amazon Technologies, Inc. | Categorical to numeric conversion of features for machine learning models |
WO2019047790A1 (en) * | 2017-09-08 | 2019-03-14 | 第四范式(北京)技术有限公司 | Method and system for generating combined features of machine learning samples |
US20190325339A1 (en) * | 2018-04-22 | 2019-10-24 | Trendalyze Inc. | Method for converting nominal to ordinal or continuous variables using time-series distances |
CN108391129A (en) * | 2018-04-25 | 2018-08-10 | 西安万像电子科技有限公司 | Data-encoding scheme and device |
CN110471948A (en) * | 2019-07-10 | 2019-11-19 | 北京交通大学 | A kind of customs declaration commodity classifying intelligently method excavated based on historical data |
Non-Patent Citations (1)
Title |
---|
徐枫;: "浅析数据挖掘分类方法中的决策树算法" * |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US6871201B2 (en) | Method for building space-splitting decision tree | |
US20120158623A1 (en) | Visualizing machine learning accuracy | |
Ayadi et al. | BicFinder: a biclustering algorithm for microarray data analysis | |
TWI464604B (en) | Data clustering method and device, data processing apparatus and image processing apparatus | |
CN107832456B (en) | Parallel KNN text classification method based on critical value data division | |
US11403550B2 (en) | Classifier | |
WO2021147559A1 (en) | Service data quality measurement method, apparatus, computer device, and storage medium | |
CN107391365B (en) | Mixed feature selection method oriented to software defect prediction | |
CN112149737A (en) | Selection model training method, model selection method, selection model training device and selection model selection device, and electronic equipment | |
CN111242387A (en) | Talent departure prediction method and device, electronic equipment and storage medium | |
CN113297249A (en) | Slow query statement identification and analysis method and device and query statement statistical method and device | |
CN111241674A (en) | Data classification method and system based on mathematical modeling | |
CN117349151A (en) | Test case priority ordering method and device based on clustering and storage medium | |
CN115292303A (en) | Data processing method and device | |
CN115470279A (en) | Data source conversion method, device, equipment and medium based on enterprise data | |
CN109981630B (en) | Intrusion detection method and system based on chi-square inspection and LDOF algorithm | |
CN114418489A (en) | Intelligent monitoring and early warning method, device, equipment and medium for logistics sorting robot | |
CN114024912A (en) | Network traffic application identification analysis method and system based on improved CHAMELEON algorithm | |
CN113065597A (en) | Clustering method, device, equipment and storage medium | |
CN111861706A (en) | Data discretization regulation and control method and system and risk control model establishing method and system | |
CN109086309A (en) | A kind of index dimensional relationships define method, server and storage medium | |
CN111160391A (en) | Space division-based rapid relative density noise detection method and storage medium | |
CN109145059A (en) | For the data processing method of data statistics, server and storage medium | |
CN115687539A (en) | Knowledge base data information clustering method and system based on MapReduce model | |
CN115936736B (en) | Renewable resource recycling traceability evidence-preserving system and method |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |