CN113570024B - Data discretization method, device, electronic equipment, storage medium and program product - Google Patents
Data discretization method, device, electronic equipment, storage medium and program product Download PDFInfo
- Publication number
- CN113570024B CN113570024B CN202110735325.4A CN202110735325A CN113570024B CN 113570024 B CN113570024 B CN 113570024B CN 202110735325 A CN202110735325 A CN 202110735325A CN 113570024 B CN113570024 B CN 113570024B
- Authority
- CN
- China
- Prior art keywords
- discretization
- features
- continuous
- continuous features
- particle swarm
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/004—Artificial life, i.e. computing arrangements simulating life
- G06N3/006—Artificial life, i.e. computing arrangements simulating life based on simulated virtual individual or collective life forms, e.g. social simulations or particle swarm optimisation [PSO]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- General Health & Medical Sciences (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Computational Linguistics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Evolutionary Computation (AREA)
- Artificial Intelligence (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Health & Medical Sciences (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention provides a data discretization method, a device, an electronic device, a storage medium and a program product, wherein the method comprises the following steps: taking a label corresponding to the continuous features to be discretized, at least one preset discretization class value and a preset fitness function as parameters of the particle swarm optimization algorithm; discretizing the continuous features based on the particle swarm optimization algorithm to obtain N segmentation points corresponding to the continuous features, and discretizing the continuous features into (N +1) intervals corresponding to the N segmentation points to obtain discretization features corresponding to the continuous features. According to the data discretization method, the data discretization device, the electronic equipment, the storage medium and the program product, the optimal discretization scheme of the continuous features is obtained through the particle swarm optimization algorithm, the discretization features obtained based on the optimal discretization scheme can achieve effective training of the model, and the generalization capability of the model is improved.
Description
Technical Field
The present invention relates to the field of data processing technologies, and in particular, to a data discretization method and apparatus, an electronic device, a storage medium, and a program product.
Background
In machine learning, some processing of features is typically required before the features are trained into a model. One of the feature processing methods is continuous feature discretization, namely, continuous features are changed into class features, so that the generalization capability of the model can be increased, and the robustness on abnormal features is good.
The existing continuous characteristic discretization method mainly comprises two modes of equal-frequency discretization and equidistant discretization. The constant frequency discretization divides continuous features into N types, and the data volume of each type is the same; the equidistant discretization is to divide the continuous features into N classes, and the distance (range) between the maximum value and the minimum value of each class is the same. In addition, there are some methods of discretization using machine learning algorithms such as KMeans clustering.
When continuous features are discretized through equal-frequency discretization, equidistant discretization or a KMeans clustering algorithm and the like, discretization category values are manually set in advance, a large amount of manpower and time are usually required for discretization strategy attempts, and finally, an optimal discretization classification scheme can be obtained through manual comparison.
Disclosure of Invention
In order to solve the problems in the prior art, the invention provides a data discretization method, a data discretization device, electronic equipment, a storage medium and a program product.
The invention provides a data discretization method, which comprises the following steps: taking a label corresponding to the continuous features to be discretized, at least one preset discretization category value and a preset fitness function as parameters of the particle swarm optimization algorithm; discretizing the continuous features based on the particle swarm optimization algorithm to obtain N segmentation points corresponding to the continuous features, and discretizing the continuous features into (N +1) intervals corresponding to the N segmentation points to obtain discretization features corresponding to the continuous features.
According to the data discretization method provided by the invention, the discretization processing of the continuous features based on the particle swarm optimization algorithm to obtain the N segmentation points corresponding to the continuous features comprises the following steps: and obtaining an optimal discretization category value based on the preset fitness function by utilizing the particle swarm optimization algorithm, and discretizing the continuous features based on the preset fitness function and the optimal discretization category value to obtain N segmentation points corresponding to the continuous features.
According to the data discretization method provided by the invention, the optimal discretization category value is one of the preset at least one discretization category value.
According to the data discretization method provided by the invention, the label corresponding to the continuous feature is consistent with the label output when the target neural network model is trained based on the discretization feature; and/or the labels corresponding to the continuous features are used for the particle swarm optimization algorithm to divide the continuous features with the same label into the same interval when searching the division points. According to the data discretization method provided by the invention, the preset fitness function is consistent with the optimization target of the loss function of the target neural network model.
According to the data discretization method provided by the invention, discretizing the continuous features into (N +1) intervals corresponding to the N segmentation points to obtain discretization features corresponding to the continuous features, comprises the following steps: discretizing the continuous features into the (N +1) intervals, endowing the continuous features in each interval with preset discretization values of corresponding intervals, and obtaining the discretization features corresponding to the continuous features according to the preset discretization values.
The present invention also provides a data discretization apparatus comprising: a parameter setting module for: taking a label corresponding to the continuous features to be discretized, at least one preset discretization category value and a preset fitness function as parameters of the particle swarm optimization algorithm;
a discretization processing module for: discretizing the continuous features based on the particle swarm optimization algorithm to obtain N segmentation points corresponding to the continuous features, and discretizing the continuous features into (N +1) intervals corresponding to the N segmentation points to obtain discretization features corresponding to the continuous features.
The invention also provides an electronic device, which comprises a memory, a processor and a computer program stored on the memory and capable of running on the processor, wherein the processor executes the program to realize the steps of the data discretization method.
The present invention also provides a non-transitory computer readable storage medium having stored thereon a computer program which, when executed by a processor, implements the steps of the data discretization method according to any of the above.
The present invention also provides a computer program product comprising computer programs/instructions which, when executed by a processor, implement the steps of the data discretization method according to any of the above.
According to the data discretization method, the data discretization device, the electronic equipment, the storage medium and the program product, the optimal segmentation point for realizing discretization of the continuous features is obtained through at least one discretization category value input into the particle swarm optimization algorithm, the preset fitness function and the label corresponding to the continuous features, the discretization features are further obtained based on the optimal segmentation point, the discretization features are used for training the target neural network model, and the generalization capability of the model can be effectively improved.
Drawings
In order to more clearly illustrate the technical solutions of the present invention or the prior art, the drawings needed for the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and those skilled in the art can also obtain other drawings according to the drawings without creative efforts.
FIG. 1 is a schematic flow chart of a data discretization method provided by the present invention;
FIG. 2 is a second schematic flow chart of the data discretization method provided by the present invention;
FIG. 3 is a schematic structural diagram of a data discretization apparatus provided by the present invention;
FIG. 4 is a schematic diagram of an electronic device provided by the present invention;
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention clearer, the technical solutions of the present invention will be clearly and completely described below with reference to the accompanying drawings, and it is obvious that the described embodiments are some, but not all embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
Particle Swarm Optimization algorithms (Particle Swarm Optimization Algorithm) have originated from an exploratory study of foraging behavior of flying bird populations. The principle is that in the process of predation of a group of birds, when a certain bird in the group finds food in a short distance, the bird starts flying to a food storage place, other birds in the group see the behavior, start learning to the bird, move to the food storage place until the birds in the whole group fly to the food storage place, and accordingly find the food. The method is an information sharing mechanism in a natural state, and each individual in a group can remember own flight experience in the cognition and searching processes; meanwhile, the aircraft can learn from other excellent individuals in the group, and when the aircraft finds that the flight of some other individual in the group is better, the aircraft can learn from the individual with better flight in the group and make proper adjustment on the flight of the aircraft, so that the aircraft flies in a more accurate direction. Through research and simulation of the behavior of the birds, the individual optimality and the global optimality of the group are continuously updated, and finally the optimal result required by people is obtained.
Continuous features: the features can be divided into continuous features and discrete features according to whether the feature values are continuous or not. The characteristic which can be randomly valued in a certain interval is called as continuous characteristic, the numerical value is continuous, two adjacent numerical values can be infinitely divided, and an infinite number of numerical values can be obtained.
Discrete characteristics: discrete features refer to features whose feature values can be listed in a certain order, usually in integer numbers. Such as user gender, nationality, type of item, etc. Some scenes which belong to continuous features in nature are also valued as integers, i.e. they can be treated as discrete features.
Continuous feature discretization: discretization is a common data processing method for converting continuous numerical attributes into discrete numerical attributes.
Fig. 1 is a schematic flow chart of a data discretization method provided by the present invention, and as shown in fig. 1, the method includes:
step S110, taking a label corresponding to the continuous features to be discretized, at least one preset discretization category value and a preset fitness function as parameters of a particle swarm optimization algorithm;
step S120, discretizing the continuous features based on the particle swarm optimization algorithm to obtain N segmentation points corresponding to the continuous features, and discretizing the continuous features into (N +1) intervals corresponding to the N segmentation points to obtain discretized features corresponding to the continuous features.
It should be noted that the preset fitness function provides a direction for selecting an optimal scheme for the particle swarm optimization algorithm, and one or more discretization category values input into the particle swarm optimization algorithm may be provided. The label of the continuous feature indicates a characteristic of the continuous feature, for example, when there is a set of corresponding relationship data of the age of the consumer and whether the consumer is willing to buy online, specifically: a (25 years old), willing to buy online; b (28 years old), willing to buy on-line; c (age 55), willing to buy online; d (13 years old) is unwilling to buy online; e (67 years old), unwilling online shopping, when it is desired to know the corresponding relationship between a certain age group and whether online shopping is willing, the "age" in the above data is taken as the continuous feature, and the "whether online shopping is willing" is taken as the label corresponding to the continuous feature, after the continuous feature is discretized by the particle swarm optimization algorithm, the corresponding relationship data between the age group and whether online shopping is willing can be obtained, that is, the discretized feature and the corresponding label, specifically: under 18 years of age, do not wish to purchase online; 18-60 years old, and willing to be purchased on line; above 60 years old, do not like online shopping.
Discretizing the continuous features based on a particle swarm optimization algorithm to obtain N segmentation points corresponding to the continuous features, and discretizing the continuous features into (N +1) intervals corresponding to the N segmentation points to obtain discretization features corresponding to the continuous features.
According to the data discretization method provided by the invention, the optimal segmentation point for realizing discretization of the continuous features is obtained through at least one discretization category value input into the particle swarm optimization algorithm, the preset fitness function and the label corresponding to the continuous features, the discretization features are further obtained based on the optimal segmentation point, and the discretization features are utilized to train the target neural network model, so that the generalization capability of the model can be effectively improved.
According to the data discretization method provided by the invention, in the invention, the discretization of the continuous features based on the particle swarm optimization algorithm to obtain N segmentation points corresponding to the continuous features comprises the following steps:
and obtaining an optimal discretization category value based on the preset fitness function by utilizing the particle swarm optimization algorithm, and discretizing the continuous features based on the preset fitness function and the optimal discretization category value to obtain N segmentation points corresponding to the continuous features.
When the discretization class value input into the particle swarm optimization algorithm is one, for example, the discretization class value is (N +1), the particle swarm optimization algorithm searches N division points under the condition of the discretization class value (N +1) based on a preset fitness function, and obtains (N +1) intervals based on the N division points; when the discretization class value input into the particle swarm optimization algorithm is plural, for example, the plurality of discretization class values are (N) respectively 1 +1),(N 2 +1)...(N i +1), wherein (N) i +1) represents the ith discretization category value, and the particle swarm optimization algorithm searches the optimal (N) based on the preset fitness function i +1) and then based on a preset fitness function sum (N) i +1) search for the discretized category value (N) i N under +1) conditions i Individual specific division points, based on N i A division point is obtained (N) i +1) intervals.
According to the data discretization method provided by the invention, when the discretization category value input into the particle swarm optimization algorithm is one, N segmentation points are directly obtained on the basis of the preset fitness function search, so that the direct optimization search process of the segmentation points on the basis of the preset fitness function is realized; when a plurality of discretization category values are input into the particle swarm optimization algorithm, the optimal discretization category value is searched based on the preset fitness function, and then N segmentation points are searched based on the obtained optimal discretization category value and the preset fitness function, so that the step-by-step optimization searching process of the segmentation points is realized.
According to the data discretization method provided by the invention, in the invention, the optimal discretization category value is one of the preset at least one discretization category value.
It should be noted that, when there is one discretization category value input into the particle swarm optimization algorithm, N division points are searched based on the discretization category value, and (N +1) intervals are obtained based on the obtained N division points; when the discretization category value input to the particle swarm optimization algorithm is plural, N division points are searched based on one of the discretization category values (i.e., the optimal discretization category value) and (N +1) intervals are obtained based on the obtained N division points.
According to the data discretization method provided by the invention, the uniquely determined (N +1) intervals are obtained based on the discretization class values input into the particle swarm optimization algorithm, so that the corresponding relation between the input class values and the output (N +1) intervals is ensured, and the stability of the output result is ensured.
According to the data discretization method provided by the invention, in the invention, the label corresponding to the continuous feature is consistent with the label output when the target neural network model is trained based on the discretization feature;
and/or the labels corresponding to the continuous features are used for the particle swarm optimization algorithm to divide the continuous features with the same label into the same interval when searching the division points.
It should be noted that the labels corresponding to the continuous features are consistent with the labels output by the target neural network model. The target neural network model can be set into different neural networks according to different application scenes. After discretizing the continuous characteristics of the corresponding relation between the age of a group of consumers and the online shopping willingness through a particle swarm optimization algorithm, the corresponding relation data between the age bracket and the online shopping willingness can be obtained, namely the discretization characteristics and the corresponding labels, and the method specifically comprises the following steps: under 18 years of age, do not wish to purchase online; the age is 18-60 years old, and online shopping is willing; over 60 years old, they are unwilling to buy on-line.
When the target neural network model is trained by using the discretization features and the labels corresponding to the discretization features, the input age is '45 years', the 45 years belong to the discretization interval '18 years-60 years', the corresponding labels are 'willing to purchase on the internet', and the output labels during the training of the target neural network model are set to be 'willing to purchase on the internet'.
The labels corresponding to the continuous features can be used for the particle swarm optimization algorithm to divide the continuous features with the same label into a section when searching for the segmentation point.
The data discretization method provided by the invention sets the label corresponding to the continuous characteristic to be consistent with the label output when the target neural network model is trained based on the discretization characteristic, and/or takes the label carried by the continuous characteristic as the basis for searching the segmentation point, and the target neural network model is trained based on the discretization characteristic, so that the generalization capability of the model can be effectively improved.
According to the data discretization method provided by the invention, the preset fitness function is consistent with the optimization target of the loss function of the target neural network model.
In the present invention, the target neural network model refers to a neural network model trained using the obtained discretized features. During training, the labels corresponding to the continuous features are set to be consistent with the labels output when the target neural network model is trained based on the discretized features.
In a neural network model, a loss function plays a role in measuring the quality of model prediction, and in a popular way, the loss function is used for expressing the difference degree between prediction and actual data, the smaller the loss function is, the better the robustness of the model is, and the loss function also determines the optimization direction in the training process of the neural network model. The particle swarm fitness function is also called an objective function, is an optimization target of a particle swarm optimization algorithm, and is used for evaluating the quality of a given candidate solution (particle). When the preset fitness function is consistent with the optimization target of the loss function of the target neural network model, the discretization characteristic obtained by the particle swarm optimization algorithm can be used for better training the target neural network model. For example, when the target neural network model is a decision tree model, the preset fitness function may be set to one of a degree of purity of kini, an information gain, and an information gain rate.
According to the data discretization method provided by the invention, the optimization targets of the preset fitness function and the loss function of the target neural network model are kept consistent, so that the discretization characteristics obtained by utilizing the particle swarm optimization algorithm can be used for better training the target neural network model, and the accuracy of the output result of the target neural network model is improved.
According to a data discretization method provided by the invention, discretizing the continuous features into (N +1) intervals corresponding to the N segmentation points to obtain discretization features corresponding to the continuous features comprises the following steps: discretizing the continuous features into the (N +1) intervals, endowing the continuous features in each interval with preset discretization values of corresponding intervals, and obtaining the discretization features corresponding to the continuous features according to the preset discretization values.
It should be noted that, the continuous features in each interval are assigned to the preset discretization value of the corresponding interval, that is, after the discretization of the continuous features is completed and the corresponding discretization features are obtained, the obtained discretization features are assigned, and the discretization value is used to represent the corresponding discretization features, still taking the above example as an example, when the discretization feature "18 years old or less" is obtained; 18-60 years old; and above 60 years old ", assigning 0 to 'below 18 years old', 1 to '18-60 years old' and 2 to 'above 60 years old', and finally obtaining the discretization characteristics after assignment processing: 0,1,2.
According to the data discretization method provided by the invention, the (N +1) intervals are assigned, the corresponding (N +1) intervals are represented by the discretization values, so that the discretization characteristics can be more simply represented on the basis, the complex data input is avoided, the data input error is further avoided, the target neural network model is trained on the basis of the preset discretization values and the corresponding labels, and the reliability and the accuracy of model training are improved.
Fig. 2 is a second schematic flow chart of the data discretization method provided by the present invention, and as shown in fig. 2, Feature represents a continuous Feature, and Label represents a tag corresponding to the continuous Feature.
Step1, inputting the Feature of the continuous characteristic and the Label corresponding to the Feature of the continuous characteristic into the particle swarm optimization algorithm, simultaneously taking the purity of the Gini as a preset fitness function of the particle swarm optimization algorithm, and inputting the discrete class value 3 into the particle swarm optimization algorithm;
step2, outputting (12.9,24,32) an optimal segmentation point BUCKET based on the continuous features on the premise that the discretization category value is 3;
step3, obtaining 4 intervals of continuous features based on the obtained optimal segmentation points, wherein the intervals are respectively as follows: assigning 4 intervals to x < ═ 12.9, 12.9< x < ═ 24, 24< x < + > 32 and 32< x, assigning x < ═ 12.9 and 0 respectively, assigning 12.9< x < + > 24 and 1 respectively, assigning 24< x < + > 32 and 2 respectively, assigning 32< x and 3 respectively, and representing the discretized features after the assignment processing as 0,1,2 and 3 respectively, wherein data in the continuous features are represented on the basis of the discretized features after the assignment processing as 0,3,0,3,1,3 and 2.
Since the decision tree model is trained after discretizing the continuous features and the corresponding labels, the preset fitness function of the particle swarm optimization algorithm can be set to be the degree of purity of the kini at Step1, so that the fitness function is ensured to be consistent with the optimization direction of the decision tree model.
According to the data discretization method provided by the invention, the optimal segmentation point for realizing discretization of the continuous features is obtained through the discretization category value input into the particle swarm optimization algorithm, the preset fitness function and the label corresponding to the continuous features, the discretization features are further obtained based on the optimal segmentation point, and the discretization features are utilized to train the neural network model, so that the generalization capability of the model can be effectively improved.
Fig. 3 is a schematic structural diagram of a data discretization apparatus provided by the present invention, and as shown in fig. 3, the data discretization apparatus 300 includes: a parameter setting module 310 and a discretization processing module 320, wherein:
a parameter setting module 310 configured to: taking a label corresponding to the continuous features to be discretized, at least one preset discretization category value and a preset fitness function as parameters of the particle swarm optimization algorithm;
a discretization processing module 320 for: discretizing the continuous features based on the particle swarm optimization algorithm to obtain N segmentation points corresponding to the continuous features, and discretizing the continuous features into (N +1) intervals corresponding to the N segmentation points to obtain discretization features corresponding to the continuous features.
According to the data discretization device provided by the invention, the optimal segmentation point for realizing discretization of the continuous features is obtained on the basis of at least one discretization category value input into the particle swarm optimization algorithm, the preset fitness function and the label corresponding to the continuous features, the discretization features are further obtained on the basis of the optimal segmentation point, and the discretization features are utilized to train the target neural network model, so that the generalization capability of the model can be effectively improved.
According to the data discretization apparatus provided by the present invention, when the discretization processing module 320 is configured to perform discretization processing on the continuous feature based on the particle swarm optimization algorithm to obtain N segmentation points corresponding to the continuous feature, specifically, the discretization processing module is configured to:
and obtaining an optimal discretization category value based on the preset fitness function by utilizing the particle swarm optimization algorithm, and discretizing the continuous features based on the preset fitness function and the optimal discretization category value to obtain N segmentation points corresponding to the continuous features.
According to the data discretization device provided by the invention, when the discretization category value input into the particle swarm optimization algorithm is one, N segmentation points are directly obtained on the basis of the preset fitness function search, so that the direct optimization search process of the segmentation points on the basis of the preset fitness function is realized; when a plurality of discretization category values are input into the particle swarm optimization algorithm, the optimal discretization category value is searched based on the preset fitness function, and then N segmentation points are searched based on the obtained optimal discretization category value and the preset fitness function, so that the step-by-step optimization searching process of the segmentation points is realized.
According to the data discretization device provided by the invention, the optimal discretization category value is one of the preset at least one discretization category value.
According to the data discretization device provided by the invention, the uniquely determined (N +1) intervals are obtained based on the discretization class values input into the particle swarm optimization algorithm, so that the corresponding relation between the input class values and the output (N +1) intervals is ensured, and the stability of the output result is ensured.
According to the data discretization device provided by the invention, the label corresponding to the continuous feature is consistent with the label output when the target neural network model is trained based on the discretization feature;
and/or the labels corresponding to the continuous features are used for the particle swarm optimization algorithm to divide the continuous features with the same label into the same interval when searching the division points.
The data discretization device provided by the invention sets the label corresponding to the continuous characteristic to be consistent with the label output when the target neural network model is trained based on the discretization characteristic, and/or takes the label carried by the continuous characteristic as the basis for searching the segmentation point, so that the target neural network model is trained based on the discretization characteristic, and the generalization capability of the model can be effectively improved.
According to the data discretization device provided by the invention, the preset fitness function is consistent with the optimization target of the loss function of the target neural network model.
According to the data discretization device provided by the invention, the optimization targets of the preset fitness function and the loss function of the target neural network model are kept consistent, so that the discretization characteristics obtained by utilizing the particle swarm optimization algorithm can be used for better training the target neural network model, and the accuracy of the output result of the target neural network model is improved.
According to the data discretization apparatus provided by the present invention, the discretization processing module 320 is specifically configured to, when being configured to discretize the continuous features into (N +1) intervals corresponding to the N segmentation points to obtain the discretization features corresponding to the continuous features: discretizing the continuous features into the (N +1) intervals, endowing the continuous features in each interval with preset discretization values of corresponding intervals, and obtaining the discretization features corresponding to the continuous features according to the preset discretization values.
According to the data discretization device provided by the invention, (N +1) intervals are assigned, and corresponding (N +1) intervals are represented by using the discretization values, so that discretization characteristics can be more simply represented on the basis, complex data input is avoided, further data input errors are avoided, and finally, a target neural network model is trained on the basis of the preset discretization values and corresponding labels, so that the reliability and accuracy of model training are improved.
Fig. 4 illustrates a physical structure diagram of an electronic device, which may include, as shown in fig. 4: a processor (processor)410, a communication Interface 420, a memory (memory)430 and a communication bus 440, wherein the processor 410, the communication Interface 420 and the memory 430 are communicated with each other via the communication bus 440. The processor 410 may call logic instructions in the memory 430 to perform a data discretization method comprising: taking a label corresponding to the continuous features to be discretized, at least one preset discretization category value and a preset fitness function as parameters of the particle swarm optimization algorithm; discretizing the continuous features based on the particle swarm optimization algorithm to obtain N segmentation points corresponding to the continuous features, and discretizing the continuous features into (N +1) intervals corresponding to the N segmentation points to obtain discretization features corresponding to the continuous features.
In addition, the logic instructions in the memory 430 may be implemented in the form of software functional units and stored in a computer readable storage medium when the software functional units are sold or used as independent products. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.
In another aspect, the present invention also provides a computer program product comprising a computer program stored on a non-transitory computer-readable storage medium, the computer program comprising program instructions which, when executed by a computer, enable the computer to perform the data discretization method provided by the above methods, the method comprising: taking a label corresponding to the continuous features to be discretized, at least one preset discretization category value and a preset fitness function as parameters of the particle swarm optimization algorithm; discretizing the continuous features based on the particle swarm optimization algorithm to obtain N segmentation points corresponding to the continuous features, and discretizing the continuous features into (N +1) intervals corresponding to the N segmentation points to obtain discretization features corresponding to the continuous features.
In yet another aspect, the present invention also provides a non-transitory computer readable storage medium having stored thereon a computer program, which when executed by a processor is implemented to perform the method for discretizing data provided above, the method comprising: taking a label corresponding to the continuous features to be discretized, at least one preset discretization class value and a preset fitness function as parameters of the particle swarm optimization algorithm; discretizing the continuous features based on the particle swarm optimization algorithm to obtain N segmentation points corresponding to the continuous features, and discretizing the continuous features into (N +1) intervals corresponding to the N segmentation points to obtain discretization features corresponding to the continuous features.
The above-described embodiments of the apparatus are merely illustrative, and the units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment. One of ordinary skill in the art can understand and implement it without inventive effort.
Through the above description of the embodiments, those skilled in the art will clearly understand that each embodiment can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware. With this understanding in mind, the above-described technical solutions may be embodied in the form of a software product, which can be stored in a computer-readable storage medium such as ROM/RAM, magnetic disk, optical disk, etc., and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the methods described in the embodiments or some parts of the embodiments.
Finally, it should be noted that: the above examples are only intended to illustrate the technical solution of the present invention, but not to limit it; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present invention.
Claims (7)
1. A method of discretizing data, comprising:
taking a label corresponding to the continuous features to be discretized, at least one preset discretization category value and a preset fitness function as parameters of the particle swarm optimization algorithm; the discretization category value is the number of intervals formed by dividing points when the continuous features are discretized; the continuous characteristics comprise age, and the labels corresponding to the continuous characteristics comprise whether online shopping is willing;
discretizing the continuous features based on the particle swarm optimization algorithm to obtain N segmentation points corresponding to the continuous features, and discretizing the continuous features into (N +1) intervals corresponding to the N segmentation points to obtain discretization features corresponding to the continuous features;
the discretization processing of the continuous features based on the particle swarm optimization algorithm to obtain N segmentation points corresponding to the continuous features comprises the following steps: acquiring an optimal discretization category value based on the preset fitness function by utilizing the particle swarm optimization algorithm, and discretizing the continuous features based on the preset fitness function and the optimal discretization category value to obtain N segmentation points corresponding to the continuous features; the optimal discretization category value is one of the preset at least one discretization category value.
2. The data discretization method according to claim 1, wherein the labels corresponding to the continuous features are consistent with labels output when a target neural network model is trained based on the discretization features;
and/or the labels corresponding to the continuous features are used for the particle swarm optimization algorithm to divide the continuous features with the same label into the same interval when searching the division points.
3. The data discretization method according to claim 2, wherein the preset fitness function is consistent with an optimization objective of a loss function of the target neural network model.
4. The data discretization method according to claim 1, wherein discretizing the continuous features into (N +1) intervals corresponding to the N segmentation points to obtain discretization features corresponding to the continuous features comprises: discretizing the continuous features into the (N +1) intervals, endowing the continuous features in each interval with preset discretization values of corresponding intervals, and obtaining the discretization features corresponding to the continuous features according to the preset discretization values.
5. A data discretization apparatus, comprising:
a parameter setting module for: taking a label corresponding to the continuous features to be discretized, at least one preset discretization category value and a preset fitness function as parameters of the particle swarm optimization algorithm; the discretization category value is the number of intervals formed by dividing points when the continuous features are discretized; the continuous characteristics comprise age, and the labels corresponding to the continuous characteristics comprise whether online shopping is willing;
a discretization processing module for: discretizing the continuous features based on the particle swarm optimization algorithm to obtain N segmentation points corresponding to the continuous features, and discretizing the continuous features into (N +1) intervals corresponding to the N segmentation points to obtain discretization features corresponding to the continuous features;
when the discretization processing module is configured to discretize the continuous features based on the particle swarm optimization algorithm to obtain N segmentation points corresponding to the continuous features, the discretization processing module is specifically configured to: acquiring an optimal discretization category value based on the preset fitness function by utilizing the particle swarm optimization algorithm, and discretizing the continuous features based on the preset fitness function and the optimal discretization category value to obtain N segmentation points corresponding to the continuous features; the optimal discretization category value is one of the preset at least one discretization category value.
6. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor implements the steps of the data discretization method according to any of the claims 1 to 4 when executing the program.
7. A non-transitory computer-readable storage medium, on which a computer program is stored, wherein the computer program, when executed by a processor, implements the steps of the data discretization method according to any of the claims 1 to 4.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110735325.4A CN113570024B (en) | 2021-06-30 | 2021-06-30 | Data discretization method, device, electronic equipment, storage medium and program product |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110735325.4A CN113570024B (en) | 2021-06-30 | 2021-06-30 | Data discretization method, device, electronic equipment, storage medium and program product |
Publications (2)
Publication Number | Publication Date |
---|---|
CN113570024A CN113570024A (en) | 2021-10-29 |
CN113570024B true CN113570024B (en) | 2022-08-12 |
Family
ID=78163246
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202110735325.4A Active CN113570024B (en) | 2021-06-30 | 2021-06-30 | Data discretization method, device, electronic equipment, storage medium and program product |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN113570024B (en) |
Citations (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111461284A (en) * | 2020-06-17 | 2020-07-28 | 同盾控股有限公司 | Data discretization method, device, equipment and medium |
Family Cites Families (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US10846308B2 (en) * | 2016-07-27 | 2020-11-24 | Anomalee Inc. | Prioritized detection and classification of clusters of anomalous samples on high-dimensional continuous and mixed discrete/continuous feature spaces |
TWI599896B (en) * | 2016-10-21 | 2017-09-21 | 嶺東科技大學 | Multiple decision attribute selection and data discretization classification method |
CN111709579B (en) * | 2020-06-17 | 2023-12-01 | 上海船舶研究设计院(中国船舶工业集团公司第六0四研究院) | Ship navigational speed optimization method and device |
-
2021
- 2021-06-30 CN CN202110735325.4A patent/CN113570024B/en active Active
Patent Citations (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111461284A (en) * | 2020-06-17 | 2020-07-28 | 同盾控股有限公司 | Data discretization method, device, equipment and medium |
Non-Patent Citations (2)
Title |
---|
An improved Fuzzy Mutual Information Feature Selection for Classification Systems;Liwei Wang;《IEEE》;20170515;第119-124页 * |
基于改进LLE的高维数据离散化方法;许统德;《计算机科学》;20150615;第146-157页 * |
Also Published As
Publication number | Publication date |
---|---|
CN113570024A (en) | 2021-10-29 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109408731B (en) | Multi-target recommendation method, multi-target recommendation model generation method and device | |
Costa et al. | Coevolution of generative adversarial networks | |
CN111967971B (en) | Bank customer data processing method and device | |
CN110598869B (en) | Classification method and device based on sequence model and electronic equipment | |
CN111444395A (en) | Method, system and equipment for obtaining relation expression between entities and advertisement recalling system | |
CN108108743A (en) | Abnormal user recognition methods and the device for identifying abnormal user | |
CN113536105A (en) | Recommendation model training method and device | |
WO2020170593A1 (en) | Information processing device and information processing method | |
CN111144567A (en) | Training method and device of neural network model | |
CN111967973B (en) | Bank customer data processing method and device | |
CN113869609A (en) | Method and system for predicting confidence of frequent subgraph of root cause analysis | |
CN111984842B (en) | Bank customer data processing method and device | |
CN113570024B (en) | Data discretization method, device, electronic equipment, storage medium and program product | |
Gias et al. | Samplehst: Efficient on-the-fly selection of distributed traces | |
CN113033709A (en) | Link prediction method and device | |
CN110070104B (en) | User recommendation method, device and server | |
CN110109005B (en) | Analog circuit fault testing method based on sequential testing | |
CN112131199A (en) | Log processing method, device, equipment and medium | |
CN111813941A (en) | Text classification method, device, equipment and medium combining RPA and AI | |
CN112258285A (en) | Content recommendation method and device, equipment and storage medium | |
US20230124495A1 (en) | Processing videos based on temporal stages | |
EP4261763A1 (en) | Apparatus and method for providing user's interior style analysis model on basis of sns text | |
CN113850670A (en) | Bank product recommendation method, device, equipment and storage medium | |
Neagoe et al. | Ant colony optimization for logistic regression and its application to wine quality assessment | |
CN114328904A (en) | Content processing method, content processing device, computer equipment and storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
TA01 | Transfer of patent application right | ||
TA01 | Transfer of patent application right |
Effective date of registration: 20220117 Address after: 100085 Floor 101 102-1, No. 35 Building, No. 2 Hospital, Xierqi West Road, Haidian District, Beijing Applicant after: Seashell Housing (Beijing) Technology Co.,Ltd. Address before: 101309 room 24, 62 Farm Road, Erjie village, Yangzhen, Shunyi District, Beijing Applicant before: Beijing fangjianghu Technology Co.,Ltd. |
|
GR01 | Patent grant | ||
GR01 | Patent grant |