CN111241674A - Data classification method and system based on mathematical modeling - Google Patents

Data classification method and system based on mathematical modeling Download PDF

Info

Publication number
CN111241674A
CN111241674A CN202010015900.9A CN202010015900A CN111241674A CN 111241674 A CN111241674 A CN 111241674A CN 202010015900 A CN202010015900 A CN 202010015900A CN 111241674 A CN111241674 A CN 111241674A
Authority
CN
China
Prior art keywords
data
classification
interval
coding
attributes
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202010015900.9A
Other languages
Chinese (zh)
Inventor
王战伟
王彩霞
武大勇
姜凌
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhengzhou University of Aeronautics
Original Assignee
Zhengzhou University of Aeronautics
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhengzhou University of Aeronautics filed Critical Zhengzhou University of Aeronautics
Priority to CN202010015900.9A priority Critical patent/CN111241674A/en
Publication of CN111241674A publication Critical patent/CN111241674A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques

Landscapes

  • Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention relates to a data classification method and a data classification system based on mathematical modeling, wherein the method comprises the following steps: acquiring historical data, and coding each historical data to obtain a coding value of each historical data; sorting the coded values of the historical data according to the sizes; dividing the sequenced codes into a plurality of continuous intervals, wherein the data volume of each interval is the same, and each interval corresponds to one data classification to obtain a mathematical model of the data classification; acquiring detection data and coding the detection data to obtain a coding value of the detection data; judging the interval where the detection data code value is located, and dividing the data corresponding to the interval into data categories which serve as the detection data; the method for coding the data comprises the following steps: sorting the attributes of the data according to a set sequence, and assigning values to the attributes according to the types of the attributes; resulting in an encoded value of the data. The technical scheme provided by the invention can solve the problem that the classification method in the prior art is low in working efficiency when searching data.

Description

Data classification method and system based on mathematical modeling
Technical Field
The invention belongs to the technical field of data classification, and particularly relates to a data classification method and system based on mathematical modeling.
Background
With the progress of science and technology, various industries gradually develop towards informatization and datamation, and data is more and more important. The data usually contains a large amount of information, and by reading the data, people can be helped to find a method for solving problems.
Data classification is to merge data having a common attribute or characteristic, and to distinguish the data by the attribute or characteristic of its category. In order to realize data sharing and improve processing efficiency, it is necessary to follow the agreed classification principle and method, and divide all the information in the system into different sets according to a certain structural system according to the content, property and management requirement of the information, so that each information has a corresponding position in the corresponding classification system. In other words, the information with the same content and the same property and the information requiring the uniform management are gathered together, the different information and the information needing the separate management are distinguished, and then the relationship among the sets is determined, so that an organized classification system is formed.
The basis of data classification research analysis is data, and the types of data can be classified into continuous variables and classification variables. The currently common data classification method is to merge data having a certain common attribute or characteristic together, and distinguish the data by the attribute or characteristic of its category.
The purpose of data classification is to facilitate the search, statistics, and analysis of data to obtain information needed to solve problems. However, it is not convenient to search data only by simply classifying the data according to its attributes, for example, a certain data has multiple attributes, and when searching the data, it is necessary to search its attributes one by one to obtain the search result.
Disclosure of Invention
The invention aims to provide a data classification method and a data classification system based on mathematical modeling, which aim to solve the problem that the data classification method in the prior art is low in working efficiency when searching data.
In order to achieve the purpose, the invention adopts the following technical scheme:
a data classification method based on mathematical modeling comprises the following steps:
(1) acquiring historical data, and coding each historical data to obtain a coding value of each historical data;
sorting the coded values of the historical data according to the sizes;
dividing the sequenced codes into a plurality of continuous intervals, wherein the data volume of each interval is the same, and each interval corresponds to one data classification to obtain a mathematical model of the data classification;
(2) acquiring detection data and coding the detection data to obtain a coding value of the detection data;
(3) judging the interval where the detection data code value is located, and dividing the data corresponding to the interval into data categories which serve as the detection data;
the method for coding the data comprises the following steps:
acquiring the attribute and the type of the attribute of each datum according to the historical data;
sorting the attributes of the data according to a set sequence, and assigning values to the attributes according to the types of the attributes;
and sequencing the assignment of each attribute according to the sequencing of the attributes to obtain the coding value of the data.
Further, the setting sequence is determined according to the influence degree of the attributes on the data.
Further, after the data are divided into a plurality of intervals, the importance degree of the corresponding data classification is obtained according to the value range of each interval; and after the data classification of the detection data is obtained, obtaining the importance degree according to the importance degree of the data classification.
Further, after the coded value of the data is obtained, normalization processing is performed on the coded value.
Further, in the step (3), a bisection method is adopted to determine the section where the detection data is located.
A data classification system based on data modeling, comprising a processor and a memory, the memory having stored thereon a computer program for execution on the processor; when the processor executes the computer program, the following control steps are realized:
(1) acquiring historical data, and coding each historical data to obtain a coding value of each historical data;
sorting the coded values of the historical data according to the sizes;
dividing the sequenced codes into a plurality of continuous intervals, wherein the data volume of each interval is the same, and each interval corresponds to one data classification to obtain a mathematical model of the data classification;
(2) acquiring detection data and coding the detection data to obtain a coding value of the detection data;
(3) judging the interval where the detection data code value is located, and dividing the data corresponding to the interval into data categories which serve as the detection data;
the method for coding the data comprises the following steps:
acquiring the attribute and the type of the attribute of each datum according to the historical data;
sorting the attributes of the data according to a set sequence, and assigning values to the attributes according to the types of the attributes;
and sequencing the assignment of each attribute according to the sequencing of the attributes to obtain the coding value of the data.
Further, the setting sequence is determined according to the influence degree of the attributes on the data.
Further, after the data are divided into a plurality of intervals, the importance degree of the corresponding data classification is obtained according to the value range of each interval; and after the data classification of the detection data is obtained, obtaining the importance degree according to the importance degree of the data classification.
Further, after the coded value of the data is obtained, normalization processing is performed on the coded value.
Further, in the step (3), a bisection method is adopted to determine the section where the detection data is located.
The invention has the beneficial effects that: according to the technical scheme provided by the invention, a mathematical model for data classification is established according to the attributes of historical data, and then the detection data is uniformly classified by combining the codes of the detection data and the established mathematical model. After the scheme provided by the invention is adopted to classify the data, the corresponding data can be quickly inquired during inquiry, and the problem of low working efficiency of the classification method in the prior art during data searching can be solved.
Drawings
FIG. 1 is a flow chart of a data classification method based on mathematical modeling in an embodiment of the method of the present invention.
Detailed Description
The method comprises the following steps:
the invention provides a data classification method based on mathematical modeling, which is used for classifying data so as to solve the problem of low working efficiency when the data classification method in the prior art is used for searching data.
The flow of the data classification method based on mathematical modeling provided by this embodiment is shown in fig. 1, and includes the following steps:
(1) and acquiring historical data, and establishing a data classification mathematical model according to the historical data.
The method for establishing the mathematical model of data classification according to the historical data comprises the following steps:
encoding the historical data to obtain the encoded value of each historical data;
sorting the coded values of the historical data according to the sizes;
and dividing the sorted historical data into a plurality of continuous intervals according to the encoding values, wherein the data amount in each interval is the same, and each interval corresponds to one data classification to obtain a mathematical model for data classification.
If it is assumed that the historical data has ten thousand pieces of data and needs to be divided into ten data categories, that is, the historical data needs to be divided into ten intervals, the coded values of the historical data are obtained first, then the historical data are sorted according to the sequence of the coded values from small to large, then the historical data are divided into ten continuous intervals according to the size of the coded values, each interval comprises one thousand pieces of historical data, and the maximum value and the minimum value of each interval are obtained.
(2) And acquiring detection data, and encoding the detection data to obtain an encoding value of the detection data.
(3) And combining the established mathematical model and the code value of the detection data to obtain the data classification of the detection data.
The method for obtaining the classification of the detection data comprises the following steps: and judging the section where the detection data code value is positioned, and taking the classification corresponding to the section as the data classification of the detection data.
The section where the detected data code value is located means that the detected data code value is greater than the minimum code value of the section and less than the maximum code value of the section.
The method for encoding the data is as follows:
firstly, determining the attribute quantity of data;
each data has multiple attributes and is obtained by analyzing historical data; in this embodiment, five attributes of data are taken as an example, and are attribute 1, attribute 2, attribute 3, attribute 4 and attribute 5 respectively;
then determining the type number of various attributes according to historical data, and coding the attributes of various types;
the types of the attributes are obtained by analyzing and summarizing historical data, for example, for the attribute 1, three types are totally included in the historical data, namely, the type 1, the type 2 and the type 3, and the codes of the attribute 1 are 00, 01, 02 and 03;
and finally, sequencing the attribute types, and obtaining the data codes according to the codes of the attribute types.
When the attribute types are sequenced, the importance degree of each type of attribute is firstly obtained, and the importance degree is determined by a worker according to the influence degree of the data; acquiring the priority of the attributes according to the importance degrees of the attributes, wherein the higher the importance degree of the attributes is, the higher the priority is; then, obtaining the sequence of each type of attribute according to the sequence of the priorities from big to small; and finally, sequencing the coded values of the attributes according to the sequencing sequence of the corresponding attribute types to obtain the codes of the data.
If the priority of the attribute of one of the data is attribute 1, attribute 2, attribute 3, attribute 4 and attribute 5 in turn, and the coded values of the attribute 1, attribute 2, attribute 3, attribute 4 and attribute 5 of the data are 01, 02, 03, 00 and 03 respectively, the coded value of the data is 0102030003.
Further, in the step (3), the section where the detection data is located is found by adopting a bisection method.
Furthermore, in order to reduce the workload when querying the section where the detection data is located, after obtaining the encoded value of the data, normalization processing is performed on the encoded value, where the processing method is as follows:
setting the code value of the obtained data as n, wherein in the historical data, the maximum code value and the minimum code value of the data are Nmax and Nmin respectively, and then the code value of the data after normalization processing is:
m=(n-Nmin)/(Nmax-Nmin)
further, the data density of each interval is obtained according to the difference between the maximum value and the minimum value of the interval, and the importance degree of the corresponding data classification is obtained according to the data density of each interval, namely the larger the data density of the interval is, the larger the importance degree of the corresponding data classification is. And after judging the data classification to which the detection data belongs, taking the importance degree of the data classification as the importance degree of the data classification, thereby obtaining the importance degree of each detection data.
The embodiment of the system is as follows:
the embodiment provides a data classification system based on mathematical modeling, which comprises a processor and a memory, wherein the memory is used for storing a computer program executed by the processor, and when the processor executes the computer program, the data classification system based on mathematical modeling provided by the method embodiment is realized.
The embodiments of the present invention disclosed above are intended merely to help clarify the technical solutions of the present invention, and it is not intended to describe all the details of the invention nor to limit the invention to the specific embodiments described. Obviously, many modifications and variations are possible in light of the above teaching. The embodiments were chosen and described in order to best explain the principles of the invention and the practical application, to thereby enable others skilled in the art to best utilize the invention. The invention is limited only by the claims and their full scope and equivalents.
Those of ordinary skill in the art will understand that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; such modifications and substitutions do not depart from the spirit and scope of the corresponding technical solutions.

Claims (10)

1. A data classification method based on mathematical modeling is characterized by comprising the following steps:
(1) acquiring historical data, and coding each historical data to obtain a coding value of each historical data;
sorting the coded values of the historical data according to the sizes;
dividing the sequenced codes into a plurality of continuous intervals, wherein the data volume of each interval is the same, and each interval corresponds to one data classification to obtain a mathematical model of the data classification;
(2) acquiring detection data and coding the detection data to obtain a coding value of the detection data;
(3) judging the interval where the detection data code value is located, and dividing the data corresponding to the interval into data categories which serve as the detection data;
the method for coding the data comprises the following steps:
acquiring the attribute and the type of the attribute of each datum according to the historical data;
sorting the attributes of the data according to a set sequence, and assigning values to the attributes according to the types of the attributes;
and sequencing the assignment of each attribute according to the sequencing of the attributes to obtain the coding value of the data.
2. The mathematical modeling based data classification method according to claim 1, wherein the setting order is determined according to the degree of influence of the attribute on the data.
3. The data classification method based on mathematical modeling according to claim 1, characterized in that after the data is divided into a plurality of intervals, the importance degree of the corresponding data classification is obtained according to the value range of each interval; and after the data classification of the detection data is obtained, obtaining the importance degree according to the importance degree of the data classification.
4. The data classification method based on data modeling according to claim 1, characterized in that after the encoded values of the data are obtained, they are normalized.
5. The data classification method based on data modeling according to claim 1, wherein the interval in which the detection data is located is determined by using a dichotomy in the step (3).
6. A data classification system based on data modeling, comprising a processor and a memory, the memory having stored thereon a computer program for execution on the processor; when the processor executes the computer program, the following control steps are realized:
(1) acquiring historical data, and coding each historical data to obtain a coding value of each historical data;
sorting the coded values of the historical data according to the sizes;
dividing the sequenced codes into a plurality of continuous intervals, wherein the data volume of each interval is the same, and each interval corresponds to one data classification to obtain a mathematical model of the data classification;
(2) acquiring detection data and coding the detection data to obtain a coding value of the detection data;
(3) judging the interval where the detection data code value is located, and dividing the data corresponding to the interval into data categories which serve as the detection data;
the method for coding the data comprises the following steps:
acquiring the attribute and the type of the attribute of each datum according to the historical data;
sorting the attributes of the data according to a set sequence, and assigning values to the attributes according to the types of the attributes;
and sequencing the assignment of each attribute according to the sequencing of the attributes to obtain the coding value of the data.
7. The data classification system based on data modeling according to claim 6, wherein the setting order is determined according to the degree of influence of the attributes on the data.
8. The data classification system based on data modeling according to claim 6, characterized in that after the data is divided into a plurality of intervals, the importance degree of the corresponding data classification is obtained according to the value range of each interval; and after the data classification of the detection data is obtained, obtaining the importance degree according to the importance degree of the data classification.
9. The data classification system based on data modeling according to claim 6, characterized in that after the encoded values of the data are obtained, they are normalized.
10. The data classification system based on data modeling according to claim 6, characterized in that the interval where the detection data is located is judged by using a dichotomy in the step (3).
CN202010015900.9A 2020-01-08 2020-01-08 Data classification method and system based on mathematical modeling Pending CN111241674A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010015900.9A CN111241674A (en) 2020-01-08 2020-01-08 Data classification method and system based on mathematical modeling

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010015900.9A CN111241674A (en) 2020-01-08 2020-01-08 Data classification method and system based on mathematical modeling

Publications (1)

Publication Number Publication Date
CN111241674A true CN111241674A (en) 2020-06-05

Family

ID=70865932

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010015900.9A Pending CN111241674A (en) 2020-01-08 2020-01-08 Data classification method and system based on mathematical modeling

Country Status (1)

Country Link
CN (1) CN111241674A (en)

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20150278371A1 (en) * 2014-04-01 2015-10-01 Tableau Software, Inc. Systems and Methods for Ranking Data Visualizations
CN108391129A (en) * 2018-04-25 2018-08-10 西安万像电子科技有限公司 Data-encoding scheme and device
WO2019047790A1 (en) * 2017-09-08 2019-03-14 第四范式(北京)技术有限公司 Method and system for generating combined features of machine learning samples
US10410140B1 (en) * 2016-08-17 2019-09-10 Amazon Technologies, Inc. Categorical to numeric conversion of features for machine learning models
US20190325339A1 (en) * 2018-04-22 2019-10-24 Trendalyze Inc. Method for converting nominal to ordinal or continuous variables using time-series distances
CN110471948A (en) * 2019-07-10 2019-11-19 北京交通大学 A kind of customs declaration commodity classifying intelligently method excavated based on historical data

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20150278371A1 (en) * 2014-04-01 2015-10-01 Tableau Software, Inc. Systems and Methods for Ranking Data Visualizations
US10410140B1 (en) * 2016-08-17 2019-09-10 Amazon Technologies, Inc. Categorical to numeric conversion of features for machine learning models
WO2019047790A1 (en) * 2017-09-08 2019-03-14 第四范式(北京)技术有限公司 Method and system for generating combined features of machine learning samples
US20190325339A1 (en) * 2018-04-22 2019-10-24 Trendalyze Inc. Method for converting nominal to ordinal or continuous variables using time-series distances
CN108391129A (en) * 2018-04-25 2018-08-10 西安万像电子科技有限公司 Data-encoding scheme and device
CN110471948A (en) * 2019-07-10 2019-11-19 北京交通大学 A kind of customs declaration commodity classifying intelligently method excavated based on historical data

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
徐枫;: "浅析数据挖掘分类方法中的决策树算法" *

Similar Documents

Publication Publication Date Title
US6871201B2 (en) Method for building space-splitting decision tree
US20120158623A1 (en) Visualizing machine learning accuracy
Ayadi et al. BicFinder: a biclustering algorithm for microarray data analysis
TWI464604B (en) Data clustering method and device, data processing apparatus and image processing apparatus
CN107832456B (en) Parallel KNN text classification method based on critical value data division
US11403550B2 (en) Classifier
WO2021147559A1 (en) Service data quality measurement method, apparatus, computer device, and storage medium
CN107391365B (en) Mixed feature selection method oriented to software defect prediction
CN112149737A (en) Selection model training method, model selection method, selection model training device and selection model selection device, and electronic equipment
CN111242387A (en) Talent departure prediction method and device, electronic equipment and storage medium
CN113297249A (en) Slow query statement identification and analysis method and device and query statement statistical method and device
CN111241674A (en) Data classification method and system based on mathematical modeling
CN117349151A (en) Test case priority ordering method and device based on clustering and storage medium
CN115292303A (en) Data processing method and device
CN115470279A (en) Data source conversion method, device, equipment and medium based on enterprise data
CN109981630B (en) Intrusion detection method and system based on chi-square inspection and LDOF algorithm
CN114418489A (en) Intelligent monitoring and early warning method, device, equipment and medium for logistics sorting robot
CN114024912A (en) Network traffic application identification analysis method and system based on improved CHAMELEON algorithm
CN113065597A (en) Clustering method, device, equipment and storage medium
CN111861706A (en) Data discretization regulation and control method and system and risk control model establishing method and system
CN109086309A (en) A kind of index dimensional relationships define method, server and storage medium
CN111160391A (en) Space division-based rapid relative density noise detection method and storage medium
CN109145059A (en) For the data processing method of data statistics, server and storage medium
CN115687539A (en) Knowledge base data information clustering method and system based on MapReduce model
CN115936736B (en) Renewable resource recycling traceability evidence-preserving system and method

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination