CN111178488A

CN111178488A - Data processing method and device

Info

Publication number: CN111178488A
Application number: CN201911336654.0A
Authority: CN
Inventors: 范慧婷; 卢亿雷
Original assignee: Enyike Beijing Data Technology Co ltd
Current assignee: Enyike Beijing Data Technology Co ltd
Priority date: 2019-12-23
Filing date: 2019-12-23
Publication date: 2020-05-19

Abstract

The embodiment of the application discloses a data processing method and device. The method comprises the following steps: acquiring an initialization population corresponding to m parameters to be adjusted, wherein the initialization population comprises lambda individuals, each individual comprises initial values corresponding to the m parameters, and m is a positive integer; carrying out iterative processing on the initialization population to obtain lambda optimal individuals; calculating the variation probability of the optimal individual; performing mutation operation on at least one parameter in at least one optimal individual according to the mutation probability to obtain a new population generated by the mutation operation; and selecting an individual according with a preset optimal selection strategy from a new population generated by the mutation operation, and determining parameter values corresponding to the m parameters.

Description

Data processing method and device

Technical Field

The present disclosure relates to the field of information processing, and more particularly, to a data processing method and apparatus.

Background

Machine learning algorithms often model collected sample data to find out the rules in the data in order to solve a certain problem. The problem to be solved is often not an accurate solution, and generally needs to be converted into an optimization problem and continuously approaches to an optimal solution. The performance of the model is often closely related to the parameters of the model, i.e., whether the proposed problem can be solved better or not requires efficient and accurate adjustment of the parameters of the model. The parameters are algorithm parameters in machine learning, generally divided into model parameters and model hyper-parameters, and are key points of the algorithm. The model parameters are learned from data and do not need to be set manually, such as support vectors in a support vector machine model or coefficients in a logical regression. The model hyper-parameters are manually configured by the model user, such as the proportion of features that each node in the decision tree needs to use.

For the hyper-parameters of the model, in the related art, the parameter adjustment method mainly comprises manual adjustment and automatic adjustment. The manual parameter adjustment is mainly to determine how the parameters change according to the use experience of the model so that the model evaluation index is higher or lower, and a model user needs to have better related professional knowledge and more practical experience, otherwise, the efficiency is low, and the time cost is increased along with the increase of the number of the parameters exceeding the model. In order to solve the above disadvantages of manual parameter adjustment, an automatic parameter adjustment algorithm is proposed, mainly including a search algorithm and a bayesian optimization algorithm. In order to obtain the global optimal value of the model objective function, the search algorithm automatically traverses the value space of each parameter to obtain the value taking point of the optimized objective function, but the search cost is very high, and the optimal solution is difficult to approach efficiently and accurately. The Bayesian optimization algorithm learns the prior of the target function, fuses the prior with the information of the sample, and utilizes a Bayesian formula to obtain the posterior information of the target function, so that the position of the function in the parameter space can be deduced to obtain the optimal solution. The Bayesian optimization algorithm needs to be subject to Gaussian distribution when the optimization function of the model is assumed, which greatly limits the universality of the method.

Therefore, how to efficiently and accurately complete the adjustment of the parameters is an urgent problem to be solved.

Disclosure of Invention

In order to solve any technical problem, embodiments of the present application provide a data processing method and apparatus.

To achieve the purpose of the embodiment of the present application, an embodiment of the present application provides a data processing method, including:

acquiring an initialization population corresponding to m parameters to be adjusted, wherein the initialization population comprises lambda individuals, each individual comprises initial values corresponding to the m parameters, and m is a positive integer;

carrying out iterative processing on the initialization population to obtain lambda optimal individuals;

calculating the variation probability of the optimal individual;

performing mutation operation on at least one parameter in at least one optimal individual according to the mutation probability to obtain a new population generated by the mutation operation;

and selecting an individual according with a preset optimal selection strategy from a new population generated by the mutation operation, and determining parameter values corresponding to the m parameters.

In an exemplary embodiment, obtaining initial values corresponding to m parameters in an individual by the following method includes:

obtaining a parameter p_iValue space [ a ]_i,b_i]Wherein i is an integer of 1 or more and m or less, a_i，b_iIs a real number;

subtending a value space [ a ]_i,b_i]Dividing the same width to obtain

An initial value interval, wherein

Is composed of

From the parameter p_iIs/are as follows

Selecting a value interval from the value intervals, and selecting a numerical value from the value interval as the parameter p_iThe corresponding initial value.

In an exemplary embodiment, the iteratively processing the initialization population to obtain λ optimal individuals includes:

repeatedly executing the following steps until lambda optimal individuals are obtained or the iteration number reaches a preset maximum iteration number T, wherein the steps comprise:

selecting lambda pairs of parent individuals from lambda individuals in the initialization population;

determining lambda offspring individuals corresponding to the parent individuals by the lambda;

from the λ pairs of individuals among the parent individuals and the λ offspring individuals, λ optimal individuals were selected.

In one exemplary embodiment, selecting λ pairs of parent individuals from λ individuals in the initialization population comprises:

calculating the fitness information of each individual according to a preset fitness calculation strategy;

selecting n individuals from the lambda individuals, selecting 2 individuals of which the fitness information accords with a preset optimal selection strategy from the n individuals as 1 pair of parent individuals, and so on until the lambda pair of parent individuals is selected, wherein n is an integer larger than 2.

In an exemplary embodiment, a progeny individual is obtained by:

wherein the content of the first and second substances,

respectively representing parent individuals in the t-th generation population, i is more than or equal to 1, and j is more than or equal to lambda; omega_i、ω_jRespectively are the values after the fitness normalization of parent individuals in the population of the T generation, wherein T is a positive integer and is less than or equal to the maximum iteration time T.

In an exemplary embodiment, the selecting λ optimal individuals from λ pairs of individuals among parent individuals and λ offspring individuals comprises:

determining the fitness information of lambda corresponding to the corresponding individual in the parent individual and lambda offspring individuals, wherein the fitness information is determined according to a preset fitness calculation strategy;

and selecting lambda optimal individuals of which the fitness information accords with a preset optimal selection strategy from the corresponding individuals and lambda offspring individuals in the lambda pair parent individuals.

In an exemplary embodiment, the mutation probability of an optimal individual is calculated by:

wherein i represents the ith individual, t represents the tth population,

the maximum fitness and the average fitness, sigma, of the t-th generation species respectively^(t)Is the variance of the population fitness of the t-th generation,

is the fitness of the ith individual in the t generation, k^(t)Is a variation factor of the t generation and is a constant.

In an exemplary embodiment, the mutation operation is performed on an optimal individual by the following methods, including:

judging whether the variation probability of the optimal individual accords with a preset variation judgment strategy or not to obtain a judgment result;

if the judgment result is that the mutation judgment strategy is met, selecting at least one parameter from the optimal individual as a mutation parameter;

calculating the noise corresponding to the variation parameter according to the value range corresponding to the variation parameter and a preset noise calculation strategy;

and if the generated noise range is in the value range corresponding to the variation parameter, adding the noise to the component to obtain a new individual after the variation processing.

In an exemplary embodiment, if the variation parameter has a value range of

The laplacian noise range increased by the variation parameter is:

wherein the content of the first and second substances,

are real numbers.

A data processing apparatus comprising a processor and a memory, the memory storing a computer program, the processor calling the computer program in the memory to implement the method of any one of the above.

According to the scheme provided by the embodiment of the application, an initialization population corresponding to m parameters to be adjusted is obtained, wherein the initialization population comprises lambda individuals, each individual comprises initial values corresponding to the m parameters, and m is a positive integer; carrying out iterative processing on the initialization population to obtain lambda optimal individuals; calculating the variation probability of the optimal individual; performing mutation operation on at least one parameter in at least one optimal individual according to the mutation probability to obtain a new population generated by the mutation operation; selecting an individual which accords with a preset optimal selection strategy from a new population generated by the mutation operation, determining parameter values corresponding to m parameters according to the new population generated by the mutation operation, constructing a next group of solutions based on a mode of carrying out mutation processing on the parameters in the individual in the population, obtaining a next generation population, determining the value of the parameters, and achieving the purpose of efficiently and accurately finishing the adjustment of the parameters.

Additional features and advantages of the embodiments of the application will be set forth in the description which follows, and in part will be obvious from the description, or may be learned by the practice of the embodiments of the application. The objectives and other advantages of the embodiments of the application may be realized and attained by the structure particularly pointed out in the written description and claims hereof as well as the appended drawings.

Drawings

The accompanying drawings are included to provide a further understanding of the embodiments of the present application and are incorporated in and constitute a part of this specification, illustrate embodiments of the present application and together with the examples of the embodiments of the present application do not constitute a limitation of the embodiments of the present application.

Fig. 1 is a flowchart of a data processing method provided in an embodiment of the present application;

fig. 2 is a flowchart of an automatic parameter adjustment method according to an embodiment of the present disclosure;

fig. 3 is a flowchart of a method for parameter mutation processing according to an embodiment of the present disclosure.

Detailed Description

In order to make the objects, technical solutions and advantages of the embodiments of the present application more apparent, the embodiments of the present application will be described in detail below with reference to the accompanying drawings. It should be noted that, in the embodiments of the present application, features in the embodiments and the examples may be arbitrarily combined with each other without conflict.

In order to be capable of automatically adjusting model parameters efficiently, accurately and universally, the application provides a general algorithm for realizing automatic parameter adjustment on a classification algorithm model. The algorithm can automatically adjust parameters of an algorithm model for different data, even training data containing noise. In addition, if the quality of the model cannot be judged only by one objective optimization function, the algorithm can also be applied when multi-objective optimization is required.

The parameters referred to in the present application refer to common parameters and hyper-parameters of a classification algorithm, the common parameters include input feature variables and value selection thresholds used by each internal node of a decision tree, weights of each edge of a neural network, support vectors in a support vector machine, and the like, and the hyper-parameters include minimum sample number of each leaf node of the decision tree, the number of hidden layers of the neural network, kernel functions in the support vector machine, and the like.

Fig. 1 is a flowchart of a data processing method according to an embodiment of the present application. The method shown in fig. 1 comprises:

101, acquiring an initialization population corresponding to m parameters to be adjusted, wherein the initialization population comprises lambda individuals, each individual comprises initial values corresponding to the m parameters, and m is a positive integer;

subtending a value space [ a ]_i,b_i]Dividing the same width to obtain

An initial value interval, wherein

Is composed of

From the parameter p_iIs/are as follows

At least one of the selection operation of the value interval and the selection operation of the value in the value interval may be randomly selected, or selected according to a preset selection rule.

102, carrying out iterative processing on the initialization population to obtain lambda optimal individuals;

in one exemplary embodiment, λ optimal individuals may be determined from the individuals resulting from the iterative operations by an iterative process that initializes the population.

In an exemplary embodiment, the selection of the optimal individual may be achieved by using a genetic algorithm, which specifically includes:

In an exemplary embodiment, the selection may also be performed in a manner that the parent individuals are put back, for example, when the X-th pair of parent individuals is selected, the parent individuals are the 1 st individual and the 3 rd individual, when the X + 1-th pair of parent individuals is selected, at least one of the 1 st individual and the 3 rd individual may also be put back into the population as an alternative individual before the selection of the individuals, so as to improve the diversity of the parent individuals.

In an exemplary embodiment, a progeny individual is obtained by:

wherein the content of the first and second substances,

respectively representing parent individuals in the t-th generation population, i is more than or equal to 1, and j is more than or equal to lambda; omega_i、ω_jRespectively are the values after the fitness normalization of parent individuals in the t generation population, wherein t is positiveInteger, less than or equal to the maximum number of iterations T.

103, calculating the variation probability of the optimal individual;

wherein i represents the ith individual, t represents the tth population,

104, performing mutation operation on at least one parameter in at least one optimal individual according to the mutation probability to obtain a new population generated by the mutation operation;

calculating the variation probability of the optimal individual;

If the variation parameter has a value range of

The laplacian noise range increased by the variation parameter is:

wherein the content of the first and second substances,

are real numbers.

And 104, selecting an individual according with a preset optimal selection strategy from a new population generated by the mutation operation, and determining parameter values corresponding to the m parameters.

Wherein the optimization judgment strategy can be determined according to the numerical value of the fitness.

According to the method provided by the embodiment of the application, an initialization population corresponding to m parameters to be adjusted is obtained, wherein the initialization population comprises lambda individuals, each individual comprises initial values corresponding to the m parameters, and m is a positive integer; carrying out iterative processing on the initialization population to obtain lambda optimal individuals; performing a mutation operation on at least one parameter in at least one optimal individual to obtain a new population generated by the mutation operation; and selecting an individual according with a preset optimized selection strategy according to a new population generated by the variation operation, determining parameter values corresponding to the m parameters, constructing a next group of solutions based on a mode of performing variation processing on the parameters in the individual in the population, obtaining a next generation population, determining the value of the parameters, and efficiently and accurately finishing the adjustment of the parameters.

In addition, using the basic principle of genetic algorithms, a solution is initially obtained by randomly selecting a combination in the parameter space, where the solution is a vector and each dimension of the vector corresponds to a parameter. The random process is repeated for many times to obtain a group of solutions, namely, a group of solutions of an optimization objective function (fitness function) is randomly selected, which can also be called as a father population, parents and individuals are selected for the group of solutions, a next group of solutions is constructed by crossing and mutation of the parents and the individuals to obtain a next generation population, and finally the above process is repeated until an optimal solution is obtained or the number of iterations reaches a threshold value.

The method provided by the embodiment of the application can greatly improve the efficiency of parameter adjustment of the model user, and save time and cost, so that more time is provided for better solving the practical problem in thinking at the model level, and parameters in the model are just like a black box without intervention of the user. In addition, the referenced genetic algorithm can automatically eliminate parameters which can not optimize the target function, and the parameters which can continuously optimize the target function are selected, so that the efficient and accurate parameter adjusting effect is achieved. Meanwhile, the diversity of the initial population can be ensured by initializing two random choices of the population, and if the data has noise points, the influence of the noise on the parameters can be reduced in the crossing and variation stages. The crossover and mutation algorithm proposed herein enables the process of searching for the optimal parameter solution to converge more efficiently towards the global optimal solution while jumping out of the local optimal solution efficiently.

Fig. 2 is a flowchart of a method for automatically adjusting parameters according to an embodiment of the present application. As shown in fig. 2, the method includes:

step 201, performing adaptive discretization on parameters;

in one exemplary embodiment, there are a total of m parameters that need to be adjusted: p is a radical of₁,p₂,...,p_mEach individual in the population is a vector of m-dimensional real-valued variables, and the population always sharesAnd (3) maintaining the number of the lambda individuals, namely the number of the individuals of the initial population or the new offspring population generated in each generation is maintained at lambda, setting the population to pass through T generations in total, and ending after the algorithm iterates for T times. The parameters are real numbers and have certain constraint conditions, namely, value ranges, so that real number coding can be adopted.

According to the parameter p_iValue space [ a ]_i,b_i]Dividing the equal width to obtain initial (generation 1) intervals and the number of the intervals

Is composed of

The difference value of the maximum value and the minimum value of the value space is expressed to carry out squaring calculation, the square calculation result is rounded downwards, and the sum of the square calculation result and 1 is calculated;

step 202, randomly selecting a value taking interval for each parameter, and selecting a random value from the value taking intervals to obtain an initialized individual;

in an exemplary embodiment, the initial λ individuals are randomly selected one interval and one value from the corresponding discretized value space.

Step 203, repeatedly executing the content of the step 202 for lambda times to obtain an initial population;

in one exemplary embodiment, any one of the λ individuals of the initial population may be represented as

Where k is 1,2, lambda,

is the first parameter p₁An initial value of (1);

step 204, calculating the fitness, namely calculating the value of the loss function;

in one exemplary embodiment, the fitness function is the objective cost function C (p) of the algorithm₁,p₂,...,p_m) The individuals are selected according to a target cost function. After the initialization parameters are obtained, the model is trained, and the fitness is calculated. The fitness of all individuals is normalized, so that the fitness of each individual is within 0,1]The normalized fitness is set as (omega)₁,ω₂,...,ω_λ)。

Step 205, selecting an algorithm to obtain a parent individual;

in an exemplary embodiment, 4 individuals are randomly selected first, then according to their fitness ranking, the optimal 2 individuals are selected as a pair of next generation parent individuals, i.e. 2 solutions of the optimization loss function are selected, and then the fitness of the two individuals is exponentially decayed respectively. For example, an original fitness value of 0.8, x according to an exponential decay function f (x)₀e^-θxWhen θ is 0.24, the first attenuation, that is, when x is 1, the fitness attenuation is 0.629. The more times of selection, the lower the fitness is, so that the diversity of selection can be ensured, individuals with low fitness can also be selected with probability, and meanwhile, the parameters of the optimized model can be kept to the next generation. The selection process is repeated until the parent individuals of the lambda pair of the next generation are selected.

In one exemplary embodiment, the input is λ individuals of generation t — t^(t)(ii) a The output is lambda pair parent individual psi of t +1 generation^(t+1)(ii) a Wherein initializing Ψ^(t+1)Is an empty set;

normalizing the fitness of all lambda individuals in the t generation to ensure that each individual has a new fitness with a value range between [0 and 1 ];

the following steps were cycled for all individuals:

4 individuals were randomly selected (4 sets of parameters), and 2 individuals were selected that minimized the cost function

If Ψ^(t+1)Will be smaller than λ

Adding Ψ as a parent Pair^(t+1)I.e. by

And recalculating the selected ones by an exponential decay function

The fitness of (2); if Ψ^(t+1)If is equal to λ, the loop exits and the selection ends.

Step 206, obtaining lambda offspring through crossing according to the formula (1);

in an exemplary embodiment, each parent is assumed to produce only one offspring, and the parents are the individuals

I is not less than 1, j is not less than lambda, then the offspring of the two individuals are

Wherein, ω is_i、ω_jFitness values after normalization for the parent individuals, respectively. Such crossover operations on all parent individuals can yield offspring comprising lambda individuals.

Step 207, selecting the optimal lambda individuals from the lambda offspring and the individuals of the lambda pair parent individuals;

in an exemplary embodiment, the population has λ offspring and λ pairs of parent individuals, and a total of μ individuals, and the optimal λ individuals are selected as the new population of the next generation according to the fitness.

208, carrying out variation on part of individuals according to a variation algorithm;

fig. 3 is a flowchart of a method for parameter mutation processing according to an embodiment of the present disclosure. As shown in fig. 3, the method includes:

the variation is consistent with the biological world rule, the variation probability of a new population obtained after crossing in the scheme is very low, the variation probability of each individual is changed in a self-adaptive manner, and the calculation formula of the self-adaptive variation probability of any individual is as follows:

wherein i represents the ith individual, t represents the tth population,

When the fitness of each individual of the population is close, the population may be in local optimum, the variation probability formula tends to increase the variation probability of the individual with the fitness higher than the average fitness, so that the search space of the genetic algorithm is not limited to a certain space, and when the fitness of each individual of the population fluctuates greatly, the variation probability of all the individuals becomes low, so that the convergence towards the direction of the optimal solution can be accelerated more effectively, and meanwhile, the diversity of the individuals in the population is also ensured.

In the scenario of the present solution, the mutation is a mutation of one component of a solution of one m-dimension, that is, a certain parameter is mutated. According to the concept of statistically small probability, which is generally regarded as a small probability of being equal to or less than 0.05 or 0.01, it is set that if the mutation probability of an individual is greater than or equal to 0.03, one of the components (genes) of the individual is randomly selected for mutation. The variation is achieved by adding a limited range of laplacian noise, the limited range being dependent on the range of values of the component (specific parameter) to be varied. The range of values of the parameter to be mutated is assumed to be

The added laplacian noise range is:

the Laplace noise is a number randomly generated according to a program, if the generated noise is within a limited range, the generated noise is taken, otherwise, the random number is continuously generated until the limited range is met.

Step 209, obtaining the next generation population generated after the variation, namely the solution of the next iteration;

step 210, judging whether the optimal solution is reached or the iteration number reaches a threshold value;

if yes, the process is ended; otherwise, step 204 is performed.

In the embodiment, by referring to the part of the genetic algorithm for parameter adjustment, an evolutionary algorithm or an algorithm which can be derived from the genetic algorithm such as neural network evolution can be used, so that the high efficiency and accuracy of automatic parameter adjustment are realized, and the algorithm has better robustness even if data has noise.

It will be understood by those of ordinary skill in the art that all or some of the steps of the methods, systems, functional modules/units in the devices disclosed above may be implemented as software, firmware, hardware, and suitable combinations thereof. In a hardware implementation, the division between functional modules/units mentioned in the above description does not necessarily correspond to the division of physical components; for example, one physical component may have multiple functions, or one function or step may be performed by several physical components in cooperation. Some or all of the components may be implemented as software executed by a processor, such as a digital signal processor or microprocessor, or as hardware, or as an integrated circuit, such as an application specific integrated circuit. Such software may be distributed on computer readable media, which may include computer storage media (or non-transitory media) and communication media (or transitory media). The term computer storage media includes volatile and nonvolatile, removable and non-removable media implemented in any method or technology for storage of information such as computer readable instructions, data structures, program modules or other data, as is well known to those of ordinary skill in the art. Computer storage media includes, but is not limited to, RAM, ROM, EEPROM, flash memory or other memory technology, CD-ROM, Digital Versatile Disks (DVD) or other optical disk storage, magnetic cassettes, magnetic tape, magnetic disk storage or other magnetic storage devices, or any other medium which can be used to store the desired information and which can accessed by a computer. In addition, communication media typically embodies computer readable instructions, data structures, program modules or other data in a modulated data signal such as a carrier wave or other transport mechanism and includes any information delivery media as known to those skilled in the art.

Claims

1. A data processing method, comprising:

calculating the variation probability of the optimal individual;

2. The method of claim 1, wherein obtaining initial values corresponding to m parameters in an individual by:

obtaining a parameter p_iValue space [ a ]_i，b_i]Wherein i is an integer of 1 or more and m or less, a_i，b_iIs a real number;

subtending a value space [ a ]_i，b_i]Dividing the same width to obtain

An initial value interval, wherein

Is composed of

From the parameter p_iIs/are as follows

3. The method of claim 1, wherein iteratively processing the initialization population to obtain λ optimal individuals comprises:

4. The method of claim 3, wherein selecting λ pairs of parent individuals from λ individuals in the initialization population comprises:

5. The method of claim 3, wherein an individual is obtained by a method comprising:

wherein the content of the first and second substances,

6. The method of claim 1, wherein selecting λ optimal individuals from λ pairs of individuals from among parent individuals and λ offspring individuals comprises:

7. The method of claim 1, wherein the mutation probability of an optimal individual is calculated by:

wherein i represents the ith individual, t represents the tth population,

the maximum fitness and the average fitness, sigma, of the t-th generation species respectively^(t)Is the variance of population fitness of the t-th generation, f_i ^(t)Is the fitness of the ith individual in the t generation, k^(t)Is a variation factor of the t generation and is a constant.

8. The method of claim 1, wherein performing the mutation on the optimal individual comprises:

9. The method of claim 8, wherein:

if the variation parameter has a value range of

The laplacian noise range increased by the variation parameter is:

wherein the content of the first and second substances,

are real numbers.

10. A data processing apparatus comprising a processor and a memory, the memory storing a computer program, the processor calling the computer program in the memory to implement the method of any one of claims 1 to 9.