CN113570024B - Data discretization method, device, electronic equipment, storage medium and program product - Google Patents

Data discretization method, device, electronic equipment, storage medium and program product Download PDF

Info

Publication number
CN113570024B
CN113570024B CN202110735325.4A CN202110735325A CN113570024B CN 113570024 B CN113570024 B CN 113570024B CN 202110735325 A CN202110735325 A CN 202110735325A CN 113570024 B CN113570024 B CN 113570024B
Authority
CN
China
Prior art keywords
discretization
features
continuous
continuous features
particle swarm
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202110735325.4A
Other languages
Chinese (zh)
Other versions
CN113570024A (en
Inventor
刘敏
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Seashell Housing Beijing Technology Co Ltd
Original Assignee
Seashell Housing Beijing Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Seashell Housing Beijing Technology Co Ltd filed Critical Seashell Housing Beijing Technology Co Ltd
Priority to CN202110735325.4A priority Critical patent/CN113570024B/en
Publication of CN113570024A publication Critical patent/CN113570024A/en
Application granted granted Critical
Publication of CN113570024B publication Critical patent/CN113570024B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/004Artificial life, i.e. computing arrangements simulating life
    • G06N3/006Artificial life, i.e. computing arrangements simulating life based on simulated virtual individual or collective life forms, e.g. social simulations or particle swarm optimisation [PSO]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention provides a data discretization method, a device, an electronic device, a storage medium and a program product, wherein the method comprises the following steps: taking a label corresponding to the continuous features to be discretized, at least one preset discretization class value and a preset fitness function as parameters of the particle swarm optimization algorithm; discretizing the continuous features based on the particle swarm optimization algorithm to obtain N segmentation points corresponding to the continuous features, and discretizing the continuous features into (N +1) intervals corresponding to the N segmentation points to obtain discretization features corresponding to the continuous features. According to the data discretization method, the data discretization device, the electronic equipment, the storage medium and the program product, the optimal discretization scheme of the continuous features is obtained through the particle swarm optimization algorithm, the discretization features obtained based on the optimal discretization scheme can achieve effective training of the model, and the generalization capability of the model is improved.

Description

Data discretization method, device, electronic equipment, storage medium and program product
Technical Field
The present invention relates to the field of data processing technologies, and in particular, to a data discretization method and apparatus, an electronic device, a storage medium, and a program product.
Background
In machine learning, some processing of features is typically required before the features are trained into a model. One of the feature processing methods is continuous feature discretization, namely, continuous features are changed into class features, so that the generalization capability of the model can be increased, and the robustness on abnormal features is good.
The existing continuous characteristic discretization method mainly comprises two modes of equal-frequency discretization and equidistant discretization. The constant frequency discretization divides continuous features into N types, and the data volume of each type is the same; the equidistant discretization is to divide the continuous features into N classes, and the distance (range) between the maximum value and the minimum value of each class is the same. In addition, there are some methods of discretization using machine learning algorithms such as KMeans clustering.
When continuous features are discretized through equal-frequency discretization, equidistant discretization or a KMeans clustering algorithm and the like, discretization category values are manually set in advance, a large amount of manpower and time are usually required for discretization strategy attempts, and finally, an optimal discretization classification scheme can be obtained through manual comparison.
Disclosure of Invention
In order to solve the problems in the prior art, the invention provides a data discretization method, a data discretization device, electronic equipment, a storage medium and a program product.
The invention provides a data discretization method, which comprises the following steps: taking a label corresponding to the continuous features to be discretized, at least one preset discretization category value and a preset fitness function as parameters of the particle swarm optimization algorithm; discretizing the continuous features based on the particle swarm optimization algorithm to obtain N segmentation points corresponding to the continuous features, and discretizing the continuous features into (N +1) intervals corresponding to the N segmentation points to obtain discretization features corresponding to the continuous features.
According to the data discretization method provided by the invention, the discretization processing of the continuous features based on the particle swarm optimization algorithm to obtain the N segmentation points corresponding to the continuous features comprises the following steps: and obtaining an optimal discretization category value based on the preset fitness function by utilizing the particle swarm optimization algorithm, and discretizing the continuous features based on the preset fitness function and the optimal discretization category value to obtain N segmentation points corresponding to the continuous features.
According to the data discretization method provided by the invention, the optimal discretization category value is one of the preset at least one discretization category value.
According to the data discretization method provided by the invention, the label corresponding to the continuous feature is consistent with the label output when the target neural network model is trained based on the discretization feature; and/or the labels corresponding to the continuous features are used for the particle swarm optimization algorithm to divide the continuous features with the same label into the same interval when searching the division points. According to the data discretization method provided by the invention, the preset fitness function is consistent with the optimization target of the loss function of the target neural network model.
According to the data discretization method provided by the invention, discretizing the continuous features into (N +1) intervals corresponding to the N segmentation points to obtain discretization features corresponding to the continuous features, comprises the following steps: discretizing the continuous features into the (N +1) intervals, endowing the continuous features in each interval with preset discretization values of corresponding intervals, and obtaining the discretization features corresponding to the continuous features according to the preset discretization values.
The present invention also provides a data discretization apparatus comprising: a parameter setting module for: taking a label corresponding to the continuous features to be discretized, at least one preset discretization category value and a preset fitness function as parameters of the particle swarm optimization algorithm;
a discretization processing module for: discretizing the continuous features based on the particle swarm optimization algorithm to obtain N segmentation points corresponding to the continuous features, and discretizing the continuous features into (N +1) intervals corresponding to the N segmentation points to obtain discretization features corresponding to the continuous features.
The invention also provides an electronic device, which comprises a memory, a processor and a computer program stored on the memory and capable of running on the processor, wherein the processor executes the program to realize the steps of the data discretization method.
The present invention also provides a non-transitory computer readable storage medium having stored thereon a computer program which, when executed by a processor, implements the steps of the data discretization method according to any of the above.
The present invention also provides a computer program product comprising computer programs/instructions which, when executed by a processor, implement the steps of the data discretization method according to any of the above.
According to the data discretization method, the data discretization device, the electronic equipment, the storage medium and the program product, the optimal segmentation point for realizing discretization of the continuous features is obtained through at least one discretization category value input into the particle swarm optimization algorithm, the preset fitness function and the label corresponding to the continuous features, the discretization features are further obtained based on the optimal segmentation point, the discretization features are used for training the target neural network model, and the generalization capability of the model can be effectively improved.
Drawings
In order to more clearly illustrate the technical solutions of the present invention or the prior art, the drawings needed for the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and those skilled in the art can also obtain other drawings according to the drawings without creative efforts.
FIG. 1 is a schematic flow chart of a data discretization method provided by the present invention;
FIG. 2 is a second schematic flow chart of the data discretization method provided by the present invention;
FIG. 3 is a schematic structural diagram of a data discretization apparatus provided by the present invention;
FIG. 4 is a schematic diagram of an electronic device provided by the present invention;
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention clearer, the technical solutions of the present invention will be clearly and completely described below with reference to the accompanying drawings, and it is obvious that the described embodiments are some, but not all embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
Particle Swarm Optimization algorithms (Particle Swarm Optimization Algorithm) have originated from an exploratory study of foraging behavior of flying bird populations. The principle is that in the process of predation of a group of birds, when a certain bird in the group finds food in a short distance, the bird starts flying to a food storage place, other birds in the group see the behavior, start learning to the bird, move to the food storage place until the birds in the whole group fly to the food storage place, and accordingly find the food. The method is an information sharing mechanism in a natural state, and each individual in a group can remember own flight experience in the cognition and searching processes; meanwhile, the aircraft can learn from other excellent individuals in the group, and when the aircraft finds that the flight of some other individual in the group is better, the aircraft can learn from the individual with better flight in the group and make proper adjustment on the flight of the aircraft, so that the aircraft flies in a more accurate direction. Through research and simulation of the behavior of the birds, the individual optimality and the global optimality of the group are continuously updated, and finally the optimal result required by people is obtained.
Continuous features: the features can be divided into continuous features and discrete features according to whether the feature values are continuous or not. The characteristic which can be randomly valued in a certain interval is called as continuous characteristic, the numerical value is continuous, two adjacent numerical values can be infinitely divided, and an infinite number of numerical values can be obtained.
Discrete characteristics: discrete features refer to features whose feature values can be listed in a certain order, usually in integer numbers. Such as user gender, nationality, type of item, etc. Some scenes which belong to continuous features in nature are also valued as integers, i.e. they can be treated as discrete features.
Continuous feature discretization: discretization is a common data processing method for converting continuous numerical attributes into discrete numerical attributes.
Fig. 1 is a schematic flow chart of a data discretization method provided by the present invention, and as shown in fig. 1, the method includes:
step S110, taking a label corresponding to the continuous features to be discretized, at least one preset discretization category value and a preset fitness function as parameters of a particle swarm optimization algorithm;
step S120, discretizing the continuous features based on the particle swarm optimization algorithm to obtain N segmentation points corresponding to the continuous features, and discretizing the continuous features into (N +1) intervals corresponding to the N segmentation points to obtain discretized features corresponding to the continuous features.
It should be noted that the preset fitness function provides a direction for selecting an optimal scheme for the particle swarm optimization algorithm, and one or more discretization category values input into the particle swarm optimization algorithm may be provided. The label of the continuous feature indicates a characteristic of the continuous feature, for example, when there is a set of corresponding relationship data of the age of the consumer and whether the consumer is willing to buy online, specifically: a (25 years old), willing to buy online; b (28 years old), willing to buy on-line; c (age 55), willing to buy online; d (13 years old) is unwilling to buy online; e (67 years old), unwilling online shopping, when it is desired to know the corresponding relationship between a certain age group and whether online shopping is willing, the "age" in the above data is taken as the continuous feature, and the "whether online shopping is willing" is taken as the label corresponding to the continuous feature, after the continuous feature is discretized by the particle swarm optimization algorithm, the corresponding relationship data between the age group and whether online shopping is willing can be obtained, that is, the discretized feature and the corresponding label, specifically: under 18 years of age, do not wish to purchase online; 18-60 years old, and willing to be purchased on line; above 60 years old, do not like online shopping.
Discretizing the continuous features based on a particle swarm optimization algorithm to obtain N segmentation points corresponding to the continuous features, and discretizing the continuous features into (N +1) intervals corresponding to the N segmentation points to obtain discretization features corresponding to the continuous features.
According to the data discretization method provided by the invention, the optimal segmentation point for realizing discretization of the continuous features is obtained through at least one discretization category value input into the particle swarm optimization algorithm, the preset fitness function and the label corresponding to the continuous features, the discretization features are further obtained based on the optimal segmentation point, and the discretization features are utilized to train the target neural network model, so that the generalization capability of the model can be effectively improved.
According to the data discretization method provided by the invention, in the invention, the discretization of the continuous features based on the particle swarm optimization algorithm to obtain N segmentation points corresponding to the continuous features comprises the following steps:
and obtaining an optimal discretization category value based on the preset fitness function by utilizing the particle swarm optimization algorithm, and discretizing the continuous features based on the preset fitness function and the optimal discretization category value to obtain N segmentation points corresponding to the continuous features.
When the discretization class value input into the particle swarm optimization algorithm is one, for example, the discretization class value is (N +1), the particle swarm optimization algorithm searches N division points under the condition of the discretization class value (N +1) based on a preset fitness function, and obtains (N +1) intervals based on the N division points; when the discretization class value input into the particle swarm optimization algorithm is plural, for example, the plurality of discretization class values are (N) respectively 1 +1),(N 2 +1)...(N i +1), wherein (N) i +1) represents the ith discretization category value, and the particle swarm optimization algorithm searches the optimal (N) based on the preset fitness function i +1) and then based on a preset fitness function sum (N) i +1) search for the discretized category value (N) i N under +1) conditions i Individual specific division points, based on N i A division point is obtained (N) i +1) intervals.
According to the data discretization method provided by the invention, when the discretization category value input into the particle swarm optimization algorithm is one, N segmentation points are directly obtained on the basis of the preset fitness function search, so that the direct optimization search process of the segmentation points on the basis of the preset fitness function is realized; when a plurality of discretization category values are input into the particle swarm optimization algorithm, the optimal discretization category value is searched based on the preset fitness function, and then N segmentation points are searched based on the obtained optimal discretization category value and the preset fitness function, so that the step-by-step optimization searching process of the segmentation points is realized.
According to the data discretization method provided by the invention, in the invention, the optimal discretization category value is one of the preset at least one discretization category value.
It should be noted that, when there is one discretization category value input into the particle swarm optimization algorithm, N division points are searched based on the discretization category value, and (N +1) intervals are obtained based on the obtained N division points; when the discretization category value input to the particle swarm optimization algorithm is plural, N division points are searched based on one of the discretization category values (i.e., the optimal discretization category value) and (N +1) intervals are obtained based on the obtained N division points.
According to the data discretization method provided by the invention, the uniquely determined (N +1) intervals are obtained based on the discretization class values input into the particle swarm optimization algorithm, so that the corresponding relation between the input class values and the output (N +1) intervals is ensured, and the stability of the output result is ensured.
According to the data discretization method provided by the invention, in the invention, the label corresponding to the continuous feature is consistent with the label output when the target neural network model is trained based on the discretization feature;
and/or the labels corresponding to the continuous features are used for the particle swarm optimization algorithm to divide the continuous features with the same label into the same interval when searching the division points.
It should be noted that the labels corresponding to the continuous features are consistent with the labels output by the target neural network model. The target neural network model can be set into different neural networks according to different application scenes. After discretizing the continuous characteristics of the corresponding relation between the age of a group of consumers and the online shopping willingness through a particle swarm optimization algorithm, the corresponding relation data between the age bracket and the online shopping willingness can be obtained, namely the discretization characteristics and the corresponding labels, and the method specifically comprises the following steps: under 18 years of age, do not wish to purchase online; the age is 18-60 years old, and online shopping is willing; over 60 years old, they are unwilling to buy on-line.
When the target neural network model is trained by using the discretization features and the labels corresponding to the discretization features, the input age is '45 years', the 45 years belong to the discretization interval '18 years-60 years', the corresponding labels are 'willing to purchase on the internet', and the output labels during the training of the target neural network model are set to be 'willing to purchase on the internet'.
The labels corresponding to the continuous features can be used for the particle swarm optimization algorithm to divide the continuous features with the same label into a section when searching for the segmentation point.
The data discretization method provided by the invention sets the label corresponding to the continuous characteristic to be consistent with the label output when the target neural network model is trained based on the discretization characteristic, and/or takes the label carried by the continuous characteristic as the basis for searching the segmentation point, and the target neural network model is trained based on the discretization characteristic, so that the generalization capability of the model can be effectively improved.
According to the data discretization method provided by the invention, the preset fitness function is consistent with the optimization target of the loss function of the target neural network model.
In the present invention, the target neural network model refers to a neural network model trained using the obtained discretized features. During training, the labels corresponding to the continuous features are set to be consistent with the labels output when the target neural network model is trained based on the discretized features.
In a neural network model, a loss function plays a role in measuring the quality of model prediction, and in a popular way, the loss function is used for expressing the difference degree between prediction and actual data, the smaller the loss function is, the better the robustness of the model is, and the loss function also determines the optimization direction in the training process of the neural network model. The particle swarm fitness function is also called an objective function, is an optimization target of a particle swarm optimization algorithm, and is used for evaluating the quality of a given candidate solution (particle). When the preset fitness function is consistent with the optimization target of the loss function of the target neural network model, the discretization characteristic obtained by the particle swarm optimization algorithm can be used for better training the target neural network model. For example, when the target neural network model is a decision tree model, the preset fitness function may be set to one of a degree of purity of kini, an information gain, and an information gain rate.
According to the data discretization method provided by the invention, the optimization targets of the preset fitness function and the loss function of the target neural network model are kept consistent, so that the discretization characteristics obtained by utilizing the particle swarm optimization algorithm can be used for better training the target neural network model, and the accuracy of the output result of the target neural network model is improved.
According to a data discretization method provided by the invention, discretizing the continuous features into (N +1) intervals corresponding to the N segmentation points to obtain discretization features corresponding to the continuous features comprises the following steps: discretizing the continuous features into the (N +1) intervals, endowing the continuous features in each interval with preset discretization values of corresponding intervals, and obtaining the discretization features corresponding to the continuous features according to the preset discretization values.
It should be noted that, the continuous features in each interval are assigned to the preset discretization value of the corresponding interval, that is, after the discretization of the continuous features is completed and the corresponding discretization features are obtained, the obtained discretization features are assigned, and the discretization value is used to represent the corresponding discretization features, still taking the above example as an example, when the discretization feature "18 years old or less" is obtained; 18-60 years old; and above 60 years old ", assigning 0 to 'below 18 years old', 1 to '18-60 years old' and 2 to 'above 60 years old', and finally obtaining the discretization characteristics after assignment processing: 0,1,2.
According to the data discretization method provided by the invention, the (N +1) intervals are assigned, the corresponding (N +1) intervals are represented by the discretization values, so that the discretization characteristics can be more simply represented on the basis, the complex data input is avoided, the data input error is further avoided, the target neural network model is trained on the basis of the preset discretization values and the corresponding labels, and the reliability and the accuracy of model training are improved.
Fig. 2 is a second schematic flow chart of the data discretization method provided by the present invention, and as shown in fig. 2, Feature represents a continuous Feature, and Label represents a tag corresponding to the continuous Feature.
Step1, inputting the Feature of the continuous characteristic and the Label corresponding to the Feature of the continuous characteristic into the particle swarm optimization algorithm, simultaneously taking the purity of the Gini as a preset fitness function of the particle swarm optimization algorithm, and inputting the discrete class value 3 into the particle swarm optimization algorithm;
step2, outputting (12.9,24,32) an optimal segmentation point BUCKET based on the continuous features on the premise that the discretization category value is 3;
step3, obtaining 4 intervals of continuous features based on the obtained optimal segmentation points, wherein the intervals are respectively as follows: assigning 4 intervals to x < ═ 12.9, 12.9< x < ═ 24, 24< x < + > 32 and 32< x, assigning x < ═ 12.9 and 0 respectively, assigning 12.9< x < + > 24 and 1 respectively, assigning 24< x < + > 32 and 2 respectively, assigning 32< x and 3 respectively, and representing the discretized features after the assignment processing as 0,1,2 and 3 respectively, wherein data in the continuous features are represented on the basis of the discretized features after the assignment processing as 0,3,0,3,1,3 and 2.
Since the decision tree model is trained after discretizing the continuous features and the corresponding labels, the preset fitness function of the particle swarm optimization algorithm can be set to be the degree of purity of the kini at Step1, so that the fitness function is ensured to be consistent with the optimization direction of the decision tree model.
According to the data discretization method provided by the invention, the optimal segmentation point for realizing discretization of the continuous features is obtained through the discretization category value input into the particle swarm optimization algorithm, the preset fitness function and the label corresponding to the continuous features, the discretization features are further obtained based on the optimal segmentation point, and the discretization features are utilized to train the neural network model, so that the generalization capability of the model can be effectively improved.
Fig. 3 is a schematic structural diagram of a data discretization apparatus provided by the present invention, and as shown in fig. 3, the data discretization apparatus 300 includes: a parameter setting module 310 and a discretization processing module 320, wherein:
a parameter setting module 310 configured to: taking a label corresponding to the continuous features to be discretized, at least one preset discretization category value and a preset fitness function as parameters of the particle swarm optimization algorithm;
a discretization processing module 320 for: discretizing the continuous features based on the particle swarm optimization algorithm to obtain N segmentation points corresponding to the continuous features, and discretizing the continuous features into (N +1) intervals corresponding to the N segmentation points to obtain discretization features corresponding to the continuous features.
According to the data discretization device provided by the invention, the optimal segmentation point for realizing discretization of the continuous features is obtained on the basis of at least one discretization category value input into the particle swarm optimization algorithm, the preset fitness function and the label corresponding to the continuous features, the discretization features are further obtained on the basis of the optimal segmentation point, and the discretization features are utilized to train the target neural network model, so that the generalization capability of the model can be effectively improved.
According to the data discretization apparatus provided by the present invention, when the discretization processing module 320 is configured to perform discretization processing on the continuous feature based on the particle swarm optimization algorithm to obtain N segmentation points corresponding to the continuous feature, specifically, the discretization processing module is configured to:
and obtaining an optimal discretization category value based on the preset fitness function by utilizing the particle swarm optimization algorithm, and discretizing the continuous features based on the preset fitness function and the optimal discretization category value to obtain N segmentation points corresponding to the continuous features.
According to the data discretization device provided by the invention, when the discretization category value input into the particle swarm optimization algorithm is one, N segmentation points are directly obtained on the basis of the preset fitness function search, so that the direct optimization search process of the segmentation points on the basis of the preset fitness function is realized; when a plurality of discretization category values are input into the particle swarm optimization algorithm, the optimal discretization category value is searched based on the preset fitness function, and then N segmentation points are searched based on the obtained optimal discretization category value and the preset fitness function, so that the step-by-step optimization searching process of the segmentation points is realized.
According to the data discretization device provided by the invention, the optimal discretization category value is one of the preset at least one discretization category value.
According to the data discretization device provided by the invention, the uniquely determined (N +1) intervals are obtained based on the discretization class values input into the particle swarm optimization algorithm, so that the corresponding relation between the input class values and the output (N +1) intervals is ensured, and the stability of the output result is ensured.
According to the data discretization device provided by the invention, the label corresponding to the continuous feature is consistent with the label output when the target neural network model is trained based on the discretization feature;
and/or the labels corresponding to the continuous features are used for the particle swarm optimization algorithm to divide the continuous features with the same label into the same interval when searching the division points.
The data discretization device provided by the invention sets the label corresponding to the continuous characteristic to be consistent with the label output when the target neural network model is trained based on the discretization characteristic, and/or takes the label carried by the continuous characteristic as the basis for searching the segmentation point, so that the target neural network model is trained based on the discretization characteristic, and the generalization capability of the model can be effectively improved.
According to the data discretization device provided by the invention, the preset fitness function is consistent with the optimization target of the loss function of the target neural network model.
According to the data discretization device provided by the invention, the optimization targets of the preset fitness function and the loss function of the target neural network model are kept consistent, so that the discretization characteristics obtained by utilizing the particle swarm optimization algorithm can be used for better training the target neural network model, and the accuracy of the output result of the target neural network model is improved.
According to the data discretization apparatus provided by the present invention, the discretization processing module 320 is specifically configured to, when being configured to discretize the continuous features into (N +1) intervals corresponding to the N segmentation points to obtain the discretization features corresponding to the continuous features: discretizing the continuous features into the (N +1) intervals, endowing the continuous features in each interval with preset discretization values of corresponding intervals, and obtaining the discretization features corresponding to the continuous features according to the preset discretization values.
According to the data discretization device provided by the invention, (N +1) intervals are assigned, and corresponding (N +1) intervals are represented by using the discretization values, so that discretization characteristics can be more simply represented on the basis, complex data input is avoided, further data input errors are avoided, and finally, a target neural network model is trained on the basis of the preset discretization values and corresponding labels, so that the reliability and accuracy of model training are improved.
Fig. 4 illustrates a physical structure diagram of an electronic device, which may include, as shown in fig. 4: a processor (processor)410, a communication Interface 420, a memory (memory)430 and a communication bus 440, wherein the processor 410, the communication Interface 420 and the memory 430 are communicated with each other via the communication bus 440. The processor 410 may call logic instructions in the memory 430 to perform a data discretization method comprising: taking a label corresponding to the continuous features to be discretized, at least one preset discretization category value and a preset fitness function as parameters of the particle swarm optimization algorithm; discretizing the continuous features based on the particle swarm optimization algorithm to obtain N segmentation points corresponding to the continuous features, and discretizing the continuous features into (N +1) intervals corresponding to the N segmentation points to obtain discretization features corresponding to the continuous features.
In addition, the logic instructions in the memory 430 may be implemented in the form of software functional units and stored in a computer readable storage medium when the software functional units are sold or used as independent products. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.
In another aspect, the present invention also provides a computer program product comprising a computer program stored on a non-transitory computer-readable storage medium, the computer program comprising program instructions which, when executed by a computer, enable the computer to perform the data discretization method provided by the above methods, the method comprising: taking a label corresponding to the continuous features to be discretized, at least one preset discretization category value and a preset fitness function as parameters of the particle swarm optimization algorithm; discretizing the continuous features based on the particle swarm optimization algorithm to obtain N segmentation points corresponding to the continuous features, and discretizing the continuous features into (N +1) intervals corresponding to the N segmentation points to obtain discretization features corresponding to the continuous features.
In yet another aspect, the present invention also provides a non-transitory computer readable storage medium having stored thereon a computer program, which when executed by a processor is implemented to perform the method for discretizing data provided above, the method comprising: taking a label corresponding to the continuous features to be discretized, at least one preset discretization class value and a preset fitness function as parameters of the particle swarm optimization algorithm; discretizing the continuous features based on the particle swarm optimization algorithm to obtain N segmentation points corresponding to the continuous features, and discretizing the continuous features into (N +1) intervals corresponding to the N segmentation points to obtain discretization features corresponding to the continuous features.
The above-described embodiments of the apparatus are merely illustrative, and the units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment. One of ordinary skill in the art can understand and implement it without inventive effort.
Through the above description of the embodiments, those skilled in the art will clearly understand that each embodiment can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware. With this understanding in mind, the above-described technical solutions may be embodied in the form of a software product, which can be stored in a computer-readable storage medium such as ROM/RAM, magnetic disk, optical disk, etc., and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the methods described in the embodiments or some parts of the embodiments.
Finally, it should be noted that: the above examples are only intended to illustrate the technical solution of the present invention, but not to limit it; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present invention.

Claims (7)

1. A method of discretizing data, comprising:
taking a label corresponding to the continuous features to be discretized, at least one preset discretization category value and a preset fitness function as parameters of the particle swarm optimization algorithm; the discretization category value is the number of intervals formed by dividing points when the continuous features are discretized; the continuous characteristics comprise age, and the labels corresponding to the continuous characteristics comprise whether online shopping is willing;
discretizing the continuous features based on the particle swarm optimization algorithm to obtain N segmentation points corresponding to the continuous features, and discretizing the continuous features into (N +1) intervals corresponding to the N segmentation points to obtain discretization features corresponding to the continuous features;
the discretization processing of the continuous features based on the particle swarm optimization algorithm to obtain N segmentation points corresponding to the continuous features comprises the following steps: acquiring an optimal discretization category value based on the preset fitness function by utilizing the particle swarm optimization algorithm, and discretizing the continuous features based on the preset fitness function and the optimal discretization category value to obtain N segmentation points corresponding to the continuous features; the optimal discretization category value is one of the preset at least one discretization category value.
2. The data discretization method according to claim 1, wherein the labels corresponding to the continuous features are consistent with labels output when a target neural network model is trained based on the discretization features;
and/or the labels corresponding to the continuous features are used for the particle swarm optimization algorithm to divide the continuous features with the same label into the same interval when searching the division points.
3. The data discretization method according to claim 2, wherein the preset fitness function is consistent with an optimization objective of a loss function of the target neural network model.
4. The data discretization method according to claim 1, wherein discretizing the continuous features into (N +1) intervals corresponding to the N segmentation points to obtain discretization features corresponding to the continuous features comprises: discretizing the continuous features into the (N +1) intervals, endowing the continuous features in each interval with preset discretization values of corresponding intervals, and obtaining the discretization features corresponding to the continuous features according to the preset discretization values.
5. A data discretization apparatus, comprising:
a parameter setting module for: taking a label corresponding to the continuous features to be discretized, at least one preset discretization category value and a preset fitness function as parameters of the particle swarm optimization algorithm; the discretization category value is the number of intervals formed by dividing points when the continuous features are discretized; the continuous characteristics comprise age, and the labels corresponding to the continuous characteristics comprise whether online shopping is willing;
a discretization processing module for: discretizing the continuous features based on the particle swarm optimization algorithm to obtain N segmentation points corresponding to the continuous features, and discretizing the continuous features into (N +1) intervals corresponding to the N segmentation points to obtain discretization features corresponding to the continuous features;
when the discretization processing module is configured to discretize the continuous features based on the particle swarm optimization algorithm to obtain N segmentation points corresponding to the continuous features, the discretization processing module is specifically configured to: acquiring an optimal discretization category value based on the preset fitness function by utilizing the particle swarm optimization algorithm, and discretizing the continuous features based on the preset fitness function and the optimal discretization category value to obtain N segmentation points corresponding to the continuous features; the optimal discretization category value is one of the preset at least one discretization category value.
6. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor implements the steps of the data discretization method according to any of the claims 1 to 4 when executing the program.
7. A non-transitory computer-readable storage medium, on which a computer program is stored, wherein the computer program, when executed by a processor, implements the steps of the data discretization method according to any of the claims 1 to 4.
CN202110735325.4A 2021-06-30 2021-06-30 Data discretization method, device, electronic equipment, storage medium and program product Active CN113570024B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110735325.4A CN113570024B (en) 2021-06-30 2021-06-30 Data discretization method, device, electronic equipment, storage medium and program product

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110735325.4A CN113570024B (en) 2021-06-30 2021-06-30 Data discretization method, device, electronic equipment, storage medium and program product

Publications (2)

Publication Number Publication Date
CN113570024A CN113570024A (en) 2021-10-29
CN113570024B true CN113570024B (en) 2022-08-12

Family

ID=78163246

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110735325.4A Active CN113570024B (en) 2021-06-30 2021-06-30 Data discretization method, device, electronic equipment, storage medium and program product

Country Status (1)

Country Link
CN (1) CN113570024B (en)

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111461284A (en) * 2020-06-17 2020-07-28 同盾控股有限公司 Data discretization method, device, equipment and medium

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10846308B2 (en) * 2016-07-27 2020-11-24 Anomalee Inc. Prioritized detection and classification of clusters of anomalous samples on high-dimensional continuous and mixed discrete/continuous feature spaces
TWI599896B (en) * 2016-10-21 2017-09-21 嶺東科技大學 Multiple decision attribute selection and data discretization classification method
CN111709579B (en) * 2020-06-17 2023-12-01 上海船舶研究设计院(中国船舶工业集团公司第六0四研究院) Ship navigational speed optimization method and device

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111461284A (en) * 2020-06-17 2020-07-28 同盾控股有限公司 Data discretization method, device, equipment and medium

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
An improved Fuzzy Mutual Information Feature Selection for Classification Systems;Liwei Wang;《IEEE》;20170515;第119-124页 *
基于改进LLE的高维数据离散化方法;许统德;《计算机科学》;20150615;第146-157页 *

Also Published As

Publication number Publication date
CN113570024A (en) 2021-10-29

Similar Documents

Publication Publication Date Title
CN109408731B (en) Multi-target recommendation method, multi-target recommendation model generation method and device
Costa et al. Coevolution of generative adversarial networks
CN111967971B (en) Bank customer data processing method and device
CN110598869B (en) Classification method and device based on sequence model and electronic equipment
CN111444395A (en) Method, system and equipment for obtaining relation expression between entities and advertisement recalling system
CN108108743A (en) Abnormal user recognition methods and the device for identifying abnormal user
CN113536105A (en) Recommendation model training method and device
WO2020170593A1 (en) Information processing device and information processing method
CN111144567A (en) Training method and device of neural network model
CN111967973B (en) Bank customer data processing method and device
CN113869609A (en) Method and system for predicting confidence of frequent subgraph of root cause analysis
CN111984842B (en) Bank customer data processing method and device
CN113570024B (en) Data discretization method, device, electronic equipment, storage medium and program product
Gias et al. Samplehst: Efficient on-the-fly selection of distributed traces
CN113033709A (en) Link prediction method and device
CN110070104B (en) User recommendation method, device and server
CN110109005B (en) Analog circuit fault testing method based on sequential testing
CN112131199A (en) Log processing method, device, equipment and medium
CN111813941A (en) Text classification method, device, equipment and medium combining RPA and AI
CN112258285A (en) Content recommendation method and device, equipment and storage medium
US20230124495A1 (en) Processing videos based on temporal stages
EP4261763A1 (en) Apparatus and method for providing user&#39;s interior style analysis model on basis of sns text
CN113850670A (en) Bank product recommendation method, device, equipment and storage medium
Neagoe et al. Ant colony optimization for logistic regression and its application to wine quality assessment
CN114328904A (en) Content processing method, content processing device, computer equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
TA01 Transfer of patent application right
TA01 Transfer of patent application right

Effective date of registration: 20220117

Address after: 100085 Floor 101 102-1, No. 35 Building, No. 2 Hospital, Xierqi West Road, Haidian District, Beijing

Applicant after: Seashell Housing (Beijing) Technology Co.,Ltd.

Address before: 101309 room 24, 62 Farm Road, Erjie village, Yangzhen, Shunyi District, Beijing

Applicant before: Beijing fangjianghu Technology Co.,Ltd.

GR01 Patent grant
GR01 Patent grant