CN112906799A - Regression learning adjusting method, device and system and computer readable storage medium - Google Patents

Regression learning adjusting method, device and system and computer readable storage medium Download PDF

Info

Publication number
CN112906799A
CN112906799A CN202110215293.5A CN202110215293A CN112906799A CN 112906799 A CN112906799 A CN 112906799A CN 202110215293 A CN202110215293 A CN 202110215293A CN 112906799 A CN112906799 A CN 112906799A
Authority
CN
China
Prior art keywords
value
category
prediction
sample
label
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110215293.5A
Other languages
Chinese (zh)
Inventor
王开宏
庄伟亮
陈婷
吴三平
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
WeBank Co Ltd
Original Assignee
WeBank Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by WeBank Co Ltd filed Critical WeBank Co Ltd
Priority to CN202110215293.5A priority Critical patent/CN112906799A/en
Publication of CN112906799A publication Critical patent/CN112906799A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2415Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on parametric or probabilistic models, e.g. based on likelihood ratio or false acceptance rate versus a false rejection rate
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N7/00Computing arrangements based on specific mathematical models
    • G06N7/01Probabilistic graphical models, e.g. probabilistic networks

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Computation (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Probability & Statistics with Applications (AREA)
  • Mathematical Optimization (AREA)
  • Software Systems (AREA)
  • Mathematical Physics (AREA)
  • Computing Systems (AREA)
  • Pure & Applied Mathematics (AREA)
  • Mathematical Analysis (AREA)
  • Computational Mathematics (AREA)
  • Algebra (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The application relates to the technical field of financial science and technology, and discloses a regression learning adjusting method, device, system and storage medium, which comprise: discretizing the sample label to obtain a plurality of categories corresponding to the sample label, wherein the first category corresponds to a label value lowest interval, and the second category corresponds to a label value highest interval; constructing a multi-classification model, outputting prediction probability values of all classes, and determining regression prediction values of the samples based on the average label values of all the classes and the corresponding prediction probability values; and adjusting the regression prediction value based on the prediction probability value of the first category or/and the prediction probability value of the second category. According to the method and the device, the sample labels are discretized into a plurality of categories, a multi-classification model is established to predict the probability of each category, the probability is converted into the regression prediction value through an expectation formula, and the regression prediction value is adjusted by combining the prediction probability value of the first category and/or the prediction probability value of the second category, so that the situation that the prediction values of the head and the tail of the sample are close to the mean value is avoided, and the accuracy of sample prediction is improved.

Description

Regression learning adjusting method, device and system and computer readable storage medium
Technical Field
The present application relates to the field of financial technology (Fintech) data processing technologies, and in particular, to a method, an apparatus, a system, and a computer-readable storage medium for adjusting regression learning.
Background
With the development of computer technology, more and more technologies are applied in the financial field, and the traditional financial industry is gradually changing to financial technology (Fintech), but higher requirements are also provided for the adjustment technology of regression learning due to the requirements of security and real-time performance of the financial industry.
At present, in a traditional regression learning method, because a loss function is a mean square error, a predicted value is drawn to a sample mean value, so that prediction of a head part and a tail part of a sample is inaccurate, namely, a predicted value of a sample with a large label value is low, and a predicted value of a sample with a small label value is high, and an improved method aiming at the method is a weighted regression method. In the weighted regression method, if the training accuracy of the sample with the smaller label value is concerned, the weight of the sample with the smaller label value is increased, so that the prediction curve is more biased to the sample with the smaller label value, and the prediction error of the sample with the smaller label value is reduced.
However, the prediction curve of the weighted regression method is closer to the sample with large weight, so that the prediction error of the low-weight sample is increased, and therefore, the weighted regression method improves the prediction accuracy of the high-weight sample by sacrificing the accuracy of the low-weight sample. Moreover, the loss function of the weighted regression method is essentially a weighted mean square error function, so the predicted value still gets close to the mean value after the sample weighting.
Disclosure of Invention
The present application mainly aims to provide a method, an apparatus, a system and a computer readable storage medium for adjusting regression learning, which aim to improve the accuracy of sample prediction.
In order to achieve the above object, an embodiment of the present application provides a method for adjusting regression learning, where the method for adjusting regression learning includes:
discretizing the sample label to obtain a plurality of categories corresponding to the sample label, wherein the first category corresponds to a label value lowest interval, and the second category corresponds to a label value highest interval;
constructing a multi-classification model, outputting prediction probability values of all classes, and determining regression prediction values of the samples based on the average label values of all the classes and the corresponding prediction probability values;
adjusting the regression prediction value based on the prediction probability value of the first category or/and the prediction probability value of the second category.
Optionally, the discretizing the sample label to obtain a plurality of categories corresponding to the sample label, where the first category corresponds to a lowest label value interval, and the second category corresponds to a highest label value interval includes:
and dividing the sample label into a plurality of intervals with label values from low to high by combining service experience and data representation of the sample label to obtain a plurality of categories corresponding to the sample label, wherein the first category corresponds to the interval with the lowest label value, and the second category corresponds to the interval with the highest label value.
Optionally, the step of constructing a multi-classification model and outputting the prediction probability values of each class includes:
and taking each category as a target, training a corresponding multi-classification model through machine learning, and outputting a prediction probability value corresponding to each category.
Optionally, the step of determining the regression prediction value of the sample based on the average label value of each category and the prediction probability value corresponding to the average label value includes:
and correspondingly multiplying the average label value of each category with the corresponding prediction probability value, summing the results obtained after multiplication, and determining the result obtained after summation as the regression prediction value of the sample.
Optionally, the step of adjusting the regression prediction value based on the prediction probability value of the first category includes:
determining a probability value segmentation point of the first category, and determining a target sample of the first category based on the probability value segmentation point, wherein the target sample is a sample for adjusting a regression prediction value;
determining a corresponding linear interpolation function based on the target sample, and determining a target predicted value based on the linear interpolation function;
and adjusting the regression prediction value through the target prediction value.
Optionally, the step of determining the probability value cut-off point of the first category includes:
determining a probability value cut-off point for the first category based on business meaning and data representation.
Optionally, the step of determining a corresponding linear interpolation function based on the target sample and determining a target prediction value based on the linear interpolation function includes:
solving based on probability values of two known samples in the target samples and corresponding label values of the two known samples to determine the linear interpolation function;
and inputting the prediction probability value of the target sample into the linear interpolation function to obtain a corresponding function value, and determining the function value as the target prediction value.
The embodiment of the present application further provides an adjusting device for regression learning, where the adjusting device for regression learning includes:
the discretization module is used for discretizing the sample label to obtain a plurality of categories corresponding to the sample label, wherein the first category corresponds to a label value lowest interval, and the second category corresponds to a label value highest interval;
the output module is used for constructing a multi-classification model, outputting the prediction probability value of each class and determining the regression prediction value of the sample based on the average label value of each class and the corresponding prediction probability value;
and the adjusting module is used for adjusting the regression prediction value based on the prediction probability value of the first category or/and the prediction probability value of the second category.
The embodiment of the present application further provides a system for adjusting regression learning, where the system for adjusting regression learning includes a memory, a processor, and an adjustment program for regression learning stored in the memory and running on the processor, and the step of the method for adjusting regression learning is implemented when the adjustment program for regression learning is executed by the processor.
An embodiment of the present application further provides a computer-readable storage medium, where the computer-readable storage medium stores an adjustment program for regression learning, and the adjustment program for regression learning, when executed by a processor, implements the steps of the adjustment method for regression learning as described above.
The embodiment of the application provides a regression learning adjustment method, a regression learning adjustment device, a regression learning adjustment system and a computer readable storage medium, wherein a plurality of categories corresponding to sample labels are obtained by discretizing the sample labels, wherein the first category corresponds to a label value lowest interval, and the second category corresponds to a label value highest interval; constructing a multi-classification model, outputting prediction probability values of all classes, and determining regression prediction values of the samples based on the average label values of all the classes and the corresponding prediction probability values; and adjusting the regression prediction value based on the prediction probability value of the first category or/and the prediction probability value of the second category. Therefore, the sample labels are discretized into a plurality of categories, a multi-classification model is established to predict the probability of each category, the probability is converted into the regression prediction value through an expectation formula, and the regression prediction value is adjusted by combining the prediction probability value of the first category and/or the prediction probability value of the second category, so that the predicted values of the head and the tail of the sample are prevented from being close to the mean value, and the accuracy of sample prediction is improved.
Drawings
FIG. 1 is a schematic diagram of a hardware operating environment according to an embodiment of the present application;
FIG. 2 is a schematic flow chart diagram illustrating a first embodiment of a tuning method for regression learning according to the present application;
FIG. 3 is a flowchart illustrating a step S30 of the tuning method for regression learning according to the present application;
FIG. 4 is a schematic diagram of probability value cut points of the adjustment method for regression learning according to the present application;
fig. 5 is a schematic structural diagram of an adjusting apparatus for regression learning according to the present application.
The implementation, functional features and advantages of the objectives of the present application will be further described with reference to the accompanying drawings.
Detailed Description
It should be understood that the specific embodiments described herein are merely illustrative of the present application and are not intended to limit the present application. As shown in fig. 1, fig. 1 is a schematic system structure diagram of a hardware operating environment according to an embodiment of the present application. The system may be a regression learning tuning system, which may include: a processor 1001, such as a CPU, a communication bus 1002, a user interface 1003, a network interface 1004, and a memory 1005. Wherein a communication bus 1002 is used to enable connective communication between these components. The user interface 1003 may include a Display screen (Display), an input unit such as a Keyboard (Keyboard), and the optional user interface 1003 may also include a standard wired interface, a wireless interface. The network interface 1004 may optionally include a standard wired interface, a wireless interface (e.g., a Wi-Fi interface). The memory 1005 may be a high-speed RAM memory or a non-volatile memory (e.g., a magnetic disk memory). The memory 1005 may alternatively be a storage device separate from the processor 1001.
Those skilled in the art will appreciate that the tuning system architecture for regression learning shown in FIG. 1 does not constitute a limitation of the tuning system for regression learning, and may include more or fewer components than shown, or some components in combination, or a different arrangement of components. As shown in fig. 1, a memory 1005, which is a kind of computer-readable storage medium, may include therein an operating system, a network communication module, a user interface module, and a regression learning adjustment program.
In the system shown in fig. 1, the network interface 1004 is mainly used for connecting to a backend server and communicating with the backend server; the user interface 1003 is mainly used for connecting a user terminal and performing data communication with the user terminal; and the processor 1001 may be configured to call the tuning program for regression learning stored in the memory 1005, and perform the following operations:
discretizing the sample label to obtain a plurality of categories corresponding to the sample label, wherein the first category corresponds to a label value lowest interval, and the second category corresponds to a label value highest interval;
constructing a multi-classification model, outputting prediction probability values of all classes, and determining regression prediction values of the samples based on the average label values of all the classes and the corresponding prediction probability values;
adjusting the regression prediction value based on the prediction probability value of the first category or/and the prediction probability value of the second category.
Further, the processor 1001 may call the adjustment program of regression learning stored in the memory 1005, and further perform the following operations:
and dividing the sample label into a plurality of intervals with label values from low to high by combining service experience and data representation of the sample label to obtain a plurality of categories corresponding to the sample label, wherein the first category corresponds to the interval with the lowest label value, and the second category corresponds to the interval with the highest label value.
Further, the processor 1001 may call the adjustment program of regression learning stored in the memory 1005, and further perform the following operations:
and taking each category as a target, training a corresponding multi-classification model through machine learning, and outputting a prediction probability value corresponding to each category.
Further, the processor 1001 may call the adjustment program of regression learning stored in the memory 1005, and further perform the following operations:
and correspondingly multiplying the average label value of each category with the corresponding prediction probability value, summing the results obtained after multiplication, and determining the result obtained after summation as the regression prediction value of the sample.
Further, the processor 1001 may call the adjustment program of regression learning stored in the memory 1005, and further perform the following operations:
determining a probability value segmentation point of the first category, and determining a target sample of the first category based on the probability value segmentation point, wherein the target sample is a sample for adjusting a regression prediction value;
determining a corresponding linear interpolation function based on the target sample, and determining a target predicted value based on the linear interpolation function;
and adjusting the regression prediction value through the target prediction value.
Further, the processor 1001 may call the adjustment program of regression learning stored in the memory 1005, and further perform the following operations:
determining a probability value cut-off point for the first category based on business meaning and data representation.
Further, the processor 1001 may call the adjustment program of regression learning stored in the memory 1005, and further perform the following operations:
solving based on probability values of two known samples in the target samples and corresponding label values of the two known samples to determine the linear interpolation function;
and inputting the prediction probability value of the target sample into the linear interpolation function to obtain a corresponding function value, and determining the function value as the target prediction value.
The method comprises the steps of obtaining a plurality of categories corresponding to sample labels by discretizing the sample labels, wherein the first category corresponds to a label value lowest interval, and the second category corresponds to a label value highest interval; constructing a multi-classification model, outputting prediction probability values of all classes, and determining regression prediction values of the samples based on the average label values of all the classes and the corresponding prediction probability values; and adjusting the regression prediction value based on the prediction probability value of the first category or/and the prediction probability value of the second category. Therefore, the sample labels are discretized into a plurality of categories, a multi-classification model is established to predict the probability of each category, the probability is converted into the regression prediction value through an expectation formula, and the regression prediction value is adjusted by combining the prediction probability value of the first category and/or the prediction probability value of the second category, so that the predicted values of the head and the tail of the sample are prevented from being close to the mean value, and the accuracy of sample prediction is improved.
Referring to fig. 2, fig. 2 is a schematic flowchart illustrating a first embodiment of the adjusting method for regression learning.
The embodiments of the present application provide embodiments of a method for adjusting regression learning, and it should be noted that, although a logical order is shown in the flowchart, under some data, the steps shown or described may be performed in an order different from that here.
In the embodiment of the present application, an adjustment system of regression learning is taken as an execution subject for example, and for simplicity, the adjustment system of regression learning is represented as follows, and the adjustment method of regression learning includes:
step S10, discretizing the sample label to obtain a plurality of categories corresponding to the sample label, wherein the first category corresponds to the lowest label value interval, and the second category corresponds to the highest label value interval.
It should be noted that the samples in this embodiment are samples carrying labeled samples, each sample may carry one or more sample labels, and the sample labels are things to be predicted, i.e. y variables in a simple linear regression. For example, where the sample is a user, the sample label may be the user's age; when the sample is wheat, the sample label can be the future price of the wheat; when the sample is a picture, the sample label can be an animal variety contained in the picture; where the sample is audio, the sample label may be an audio meaning. When the sample label of the sample needs to be predicted, the adjustment system firstly needs to obtain the sample, determine the sample label carried by the sample, discretize the sample label according to the data characteristics of the sample label, namely classify the sample label into a plurality of label value intervals according to the data characteristics of the sample label to obtain a plurality of classes of the sample label, namely, one class can be understood to correspond to one label value interval. The number of categories is set by a user or set by an adjusting system according to data characteristics, which is not limited in this embodiment.
Further, in the multiple categories, the order of the categories is sorted according to the label value of the sample, and may be sorted according to the order of the label values from large to small, and may also be sorted according to the order of the label values from small to large, which is not limited in this embodiment. For convenience of understanding, the lowest interval of the tag values corresponds to a first category, and the highest interval of the tag values corresponds to a second category.
In this embodiment, for example, the number of categories in the adjustment system defaults to 5, and if the age of the user needs to be predicted at present, the adjustment system determines the age of the user as the sample label to be predicted. Next, the adjustment system classifies the sample labels into 5 categories, i.e., 5 label value intervals, respectively, in the interval of "1 to 19 years," 20 to 29 years, "30 to 39 years," 40 to 49 years, "and" 50 to 79 years, "according to the data characteristics of the sample labels, wherein the first category is the interval of" 1 to 19 years, "and the second category is the interval of" 50 to 79 years.
Further, the step S10 includes:
step S101, combining service experience and data representation of the sample label, dividing the sample label into a plurality of intervals with label values from low to high, and obtaining a plurality of categories corresponding to the sample label, wherein the first category corresponds to the interval with the lowest label value, and the second category corresponds to the interval with the highest label value.
Specifically, in order to more rationalize the sample label, the sample label needs to be discretized by combining the business experience and the data representation of the sample label. Therefore, before discretizing the sample label, the adjusting system needs to determine the corresponding business experience according to historical business data, or the adjusting system determines the business experience input by a user, and then classifies the sample label into a plurality of label value intervals by combining the business experience and the data expression of the sample label to obtain a plurality of classes of the sample label.
And step S20, constructing a multi-classification model, outputting the prediction probability value of each class, and determining the regression prediction value of the sample based on the average label value of each class and the corresponding prediction probability value.
After the sample labels are discretized into a plurality of classes by the adjusting system, each class of the sample labels is taken as a training target, a corresponding multi-classification model is constructed, and the prediction probability value corresponding to each class is output. Next, the adjustment system calculates an average label value for each category. And finally, the adjustment system performs numerical calculation on the prediction probability value corresponding to each category and the average label value corresponding to the prediction probability value, and determines the calculated numerical value as the regression prediction value of the sample.
Further, the step S20 includes:
step S201, aiming at each category, training a corresponding multi-classification model through machine learning, and outputting a prediction probability value corresponding to each category;
and step S202, the average label value of each category is correspondingly multiplied by the corresponding prediction probability value, the results obtained after multiplication are summed, and the result obtained after summation is determined as the regression prediction value of the sample.
Specifically, the adjustment system takes each class of the sample label as a training target, performs model training by combining a machine learning method, constructs a corresponding multi-classification model, and outputs the prediction probability value of each class. Next, the adjustment system calculates an average label value for each category based on the label values for each category. And finally, multiplying the prediction probability value corresponding to each category by the average label value corresponding to the prediction probability value by the adjustment system to obtain a product value corresponding to each category, and summing the product values of each category to obtain the regression prediction value of the sample.
In this embodiment, for example, the adjustment system divides the sample labels into 5 categories, i.e., a1, a2, A3, a4, and A5, inputs a1, a2, A3, a4, and A5 into the multi-classification model, and obtains predicted probability values of P1, P2, P3, P4, and P5, calculates average label values of a1, a2, A3, a4, and A5 as M1, M2, M3, M4, and M5, and calculates a regression prediction value of the sample as y P1 x M1+ P2 x M2+ P3 + M3+ P3.
Step S30, adjusting the regression prediction value based on the prediction probability value of the first category or/and the prediction probability value of the second category.
After calculating the regression prediction value of the sample, the adjusting system determines each probability value segmentation point of the first class, the segmentation number corresponding to each probability value segmentation point and the target number proportion of the segmentation sample, and determines the final probability value segmentation point of the first class according to each probability value segmentation point, the segmentation number corresponding to each probability value segmentation point and the target number proportion of the segmentation sample. And then, the adjusting system determines a target sample of the first category according to the final probability value segmentation point of the first category and the prediction probability value of the first category, and determines a target prediction value of the target sample by combining linear interpolation according to the target sample of the first category. And finally, the adjusting system adjusts the regression prediction value of the sample according to the target prediction value of the target sample.
Similarly, the adjusting system determines each probability value segmentation point of the second category, the segmentation number corresponding to each probability value segmentation point and the target number proportion of the segmentation samples, and determines the final probability value segmentation point of the second category according to each probability value segmentation point, the segmentation number corresponding to each probability value segmentation point and the target number proportion of the segmentation samples. And then, the adjusting system determines a target sample of the second category according to the final probability value segmentation point of the second category and the prediction probability value of the second category, and determines a target prediction value of the target sample by combining linear interpolation according to the target sample of the second category. And finally, the adjusting system adjusts the regression prediction value of the sample according to the target prediction value of the target sample.
In the embodiment, a plurality of categories corresponding to sample labels are obtained by discretizing the sample labels, wherein a first category corresponds to a lowest label value interval, and a second category corresponds to a highest label value interval; constructing a multi-classification model, outputting prediction probability values of all classes, and determining regression prediction values of the samples based on the average label values of all the classes and the corresponding prediction probability values; and adjusting the regression prediction value based on the prediction probability value of the first category or/and the prediction probability value of the second category. Therefore, in the embodiment, the sample labels are discretized into a plurality of categories, a multi-classification model is established to predict the probability of each category, the probability is converted into the regression prediction value through an expectation formula, and the regression prediction value is adjusted by combining the prediction probability value of the first category and/or the prediction probability value of the second category, so that the predicted values of the head and the tail of the sample are prevented from being close to the mean value, and the accuracy of sample prediction is improved.
Further, referring to fig. 3, fig. 3 is a schematic flowchart illustrating a step S30 of the adjustment method for regression learning according to the present application. The step S30 includes:
step S301, determining a probability value segmentation point of the first category, and determining a target sample of the first category based on the probability value segmentation point, wherein the target sample is a sample for adjusting a regression prediction value;
step S302, determining a corresponding linear interpolation function based on the target sample, and determining a target predicted value based on the linear interpolation function;
and step S303, adjusting the regression prediction value through the target prediction value.
The adjusting system determines each probability value segmentation point of the first category, the segmentation number corresponding to each probability value segmentation point and the target number proportion of segmentation samples, and determines the final probability value segmentation point of the first category according to each probability value segmentation point, the segmentation number corresponding to each probability value segmentation point and the target number proportion of segmentation samples. Then, the adjusting system determines the sample with the prediction probability value larger than the final probability value segmentation point in the first category as a target sample of the first category, wherein the target sample is the sample needing to adjust the regression prediction value. Then, the adjustment system selects two known sample points in the target sample to solve, namely, the prediction probability values of the two known samples and the corresponding label values are solved to obtain the linear interpolation function of the first category. And finally, the adjusting system inputs the prediction probability value of the target sample of the first category into the linear interpolation function to obtain a corresponding output function value, determines the output function value as a target prediction value, and adjusts the regression prediction value of the sample according to the target prediction value.
Further, the second category works the same way. And determining the final probability value segmentation point of the second category according to each probability value segmentation point, the segmentation number corresponding to each probability value segmentation point and the target number proportion of the segmentation samples by the adjustment system. Then, the adjustment system determines the sample with the predicted probability value greater than the final probability value segmentation point in the second category as the target sample of the second category. Then, the adjustment system selects two known sample points in the target sample to solve, namely, the prediction probability values of the two known samples and the corresponding label values are solved to obtain a linear interpolation function of the second category. And finally, the adjusting system inputs the prediction probability value of the target sample of the second category into the linear interpolation function to obtain a corresponding output function value, determines the output function value as a target prediction value, and adjusts the regression prediction value of the sample according to the target prediction value.
It should be noted that the determination of the probability value cut-off point of the first category may be manually determined, or may be self-determined by the adjustment system.
In this embodiment, as shown in fig. 4, fig. 4 is a schematic diagram of probability value segmentation points of the adjustment method for regression learning according to the present application. As can be seen from fig. 4, if the adjustment system determines the probability score of the first category to be 0.67, the number of people to be scored is 7859, and the accuracy (the ratio of the number of target people to be scored) is 78%. Therefore, the adjusting system determines the probability value segmentation point of the first category as 0.67, determines the samples with the prediction probability value larger than 0.67 in the first category as the target samples of the first category, and adjusts the regression prediction values of the samples according to the prediction probability values of the target samples of the first category.
Further, the step S301 of determining a probability value cut-off point of the first category includes:
step S3011, determining a probability value cut-off point of the first category based on the business meaning and the data representation.
Specifically, the adjusting system determines each probability value segmentation point of the first category, segmentation people number corresponding to each probability value segmentation point and target people number proportion of the segmentation sample according to the business meaning and the data expression, and then determines the final probability value segmentation point of the first category according to the head probability value, the tail probability value and the category discrimination of the target people number proportion of the segmentation sample.
Further, the step S302 includes:
step S3021, solving based on probability values of two known samples in the target sample and corresponding label values thereof, and determining the linear interpolation function;
step S3022, inputting the prediction probability value of the target sample into the linear interpolation function to obtain a corresponding function value, and determining the function value as the target prediction value.
Specifically, the adjustment system selects two known sample points in the target sample, that is, the prediction probability values of the two known samples and the corresponding label values are solved to obtain the linear interpolation function of the first category. Then, the adjustment system inputs the prediction probability value of the target sample of the first category into the linear interpolation function, outputs a corresponding function value, and determines the function value as a target prediction value.
In the embodiment, the target sample of the first category is determined based on the probability value segmentation point by determining the probability value segmentation point of the first category; determining a corresponding linear interpolation function based on the target sample, and determining a target predicted value based on the linear interpolation function; and adjusting the regression prediction value through the target prediction value. Therefore, in the embodiment, the corresponding target sample is determined through the probability value dividing points of the first category or/and the second category and the probability value dividing points, the target predicted value of the first category or/and the second category is determined based on the target sample and by combining the linear interpolation function, and finally the regression predicted value of the sample is adjusted through the target predicted value of the first category or/and the second category, so that the predicted values of the head and the tail of the sample are prevented from being close to the mean value, and the accuracy of sample prediction is improved.
In addition, the present application further provides an adjusting device for regression learning, referring to fig. 5, where fig. 5 is a schematic structural diagram of the adjusting device for regression learning of the present application, and the adjusting device for regression learning includes:
the discretization module 10 is configured to discretize the sample label to obtain a plurality of categories corresponding to the sample label, where a first category corresponds to a lowest label value interval, and a second category corresponds to a highest label value interval;
the output module 20 is configured to construct a multi-classification model, output a prediction probability value of each class, and determine a regression prediction value of the sample based on the average label value of each class and the prediction probability value corresponding to the average label value;
an adjusting module 30, configured to adjust the regression prediction value based on the prediction probability value of the first category or/and the prediction probability value of the second category.
Further, the discretization module 10 is further configured to, in combination with the business experience and the data representation of the sample label, divide the sample label into a plurality of intervals with label values from low to high, and obtain a plurality of categories corresponding to the sample label, where a first category corresponds to an interval with a lowest label value, and a second category corresponds to an interval with a highest label value;
the output module 20 is further configured to train a corresponding multi-class model by machine learning with the categories as targets, and output the predicted probability values corresponding to the categories.
Further, the adjusting device for regression learning further includes:
the calculation module is used for correspondingly multiplying the average label value of each category with the corresponding prediction probability value of each category and summing the results obtained after multiplication; (ii) a
And the determining module is used for determining the result obtained after summation as the regression prediction value of the sample.
Further, the determining module is further configured to determine a probability value cut-off point of the first category, and determine a target sample of the first category based on the probability value cut-off point, where the target sample is a sample for adjusting a regression prediction value;
the determining module is further used for determining a corresponding linear interpolation function based on the target sample and determining a target predicted value based on the linear interpolation function;
the adjusting module 30 is further configured to adjust the regression prediction value according to the target prediction value;
the determining module is further used for determining a probability value cut-off point of the first category based on business meaning and data performance;
the determining module is further used for solving based on probability values of two known samples in the target sample and corresponding label values thereof to determine the linear interpolation function;
the determining module is further configured to input the prediction probability value of the target sample into the linear interpolation function to obtain a corresponding function value, and determine the function value as the target prediction value.
The specific implementation of the adjustment apparatus based on regression learning in the present application is substantially the same as that of each embodiment of the adjustment method based on regression learning, and is not described herein again.
In addition, an embodiment of the present application also provides a computer-readable storage medium, where the computer-readable storage medium stores an adjustment program for regression learning, and the adjustment program for regression learning, when executed by a processor, implements the steps of the adjustment method for regression learning as described above.
The specific implementation of the computer-readable storage medium of the present application is substantially the same as the embodiments of the adjustment method for regression learning, and is not described herein again.
It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.
The above-mentioned serial numbers of the embodiments of the present application are merely for description and do not represent the merits of the embodiments.
Through the above description of the embodiments, those skilled in the art will clearly understand that the method of the above embodiments can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware, but the former is a better implementation manner in many cases. Based on such understanding, the technical solutions of the present application may be embodied in the form of software goods stored in a computer-readable storage medium (e.g., ROM/RAM, magnetic disk, optical disk), and include instructions for enabling a regression learning adjustment system to execute the methods according to the embodiments of the present application.

Claims (10)

1. A method for adjusting regression learning is characterized by comprising the following steps:
discretizing the sample label to obtain a plurality of categories corresponding to the sample label, wherein the first category corresponds to a label value lowest interval, and the second category corresponds to a label value highest interval;
constructing a multi-classification model, outputting prediction probability values of all classes, and determining regression prediction values of the samples based on the average label values of all the classes and the corresponding prediction probability values;
adjusting the regression prediction value based on the prediction probability value of the first category or/and the prediction probability value of the second category.
2. The method for adjusting regression learning according to claim 1, wherein the discretizing the sample label to obtain a plurality of categories corresponding to the sample label, wherein the first category corresponds to a label value lowest interval, and the second category corresponds to a label value highest interval comprises:
and dividing the sample label into a plurality of intervals with label values from low to high by combining service experience and data representation of the sample label to obtain a plurality of categories corresponding to the sample label, wherein the first category corresponds to the interval with the lowest label value, and the second category corresponds to the interval with the highest label value.
3. The method for adjusting regression learning of claim 1, wherein the step of constructing a multi-classification model and outputting the prediction probability values of each class comprises:
and taking each category as a target, training a corresponding multi-classification model through machine learning, and outputting a prediction probability value corresponding to each category.
4. The method for adjusting regression learning according to claim 1, wherein the step of determining the regression prediction value of the sample based on the average label value of each category and the corresponding prediction probability value thereof comprises:
and correspondingly multiplying the average label value of each category with the corresponding prediction probability value, summing the results obtained after multiplication, and determining the result obtained after summation as the regression prediction value of the sample.
5. The method for adjusting regression learning according to claim 1, wherein the step of adjusting the regression prediction value based on the prediction probability value of the first category includes:
determining a probability value segmentation point of the first category, and determining a target sample of the first category based on the probability value segmentation point, wherein the target sample is a sample for adjusting a regression prediction value;
determining a corresponding linear interpolation function based on the target sample, and determining a target predicted value based on the linear interpolation function;
and adjusting the regression prediction value through the target prediction value.
6. The method of adjusting regression learning of claim 5, wherein said step of determining probability value cut-off points for said first class comprises:
determining a probability value cut-off point for the first category based on business meaning and data representation.
7. The method for adjusting regression learning according to any one of claims 1 to 6, wherein the step of determining a corresponding linear interpolation function based on the target sample and determining a target predicted value based on the linear interpolation function comprises:
solving based on probability values of two known samples in the target samples and corresponding label values of the two known samples to determine the linear interpolation function;
and inputting the prediction probability value of the target sample into the linear interpolation function to obtain a corresponding function value, and determining the function value as the target prediction value.
8. An adjustment device for regression learning, characterized in that the adjustment device for regression learning comprises:
the discretization module is used for discretizing the sample label to obtain a plurality of categories corresponding to the sample label, wherein the first category corresponds to a label value lowest interval, and the second category corresponds to a label value highest interval;
the output module is used for constructing a multi-classification model, outputting the prediction probability value of each class and determining the regression prediction value of the sample based on the average label value of each class and the corresponding prediction probability value;
and the adjusting module is used for adjusting the regression prediction value based on the prediction probability value of the first category or/and the prediction probability value of the second category.
9. An adjustment system for regression learning, characterized in that the adjustment system for regression learning comprises a memory, a processor and an adjustment program for regression learning stored on the memory and running on the processor, and when the adjustment program for regression learning is executed by the processor, the steps of the adjustment method for regression learning according to any one of claims 1 to 7 are implemented.
10. A computer-readable storage medium, characterized in that the computer-readable storage medium has stored thereon a tuning program for regression learning, which when executed by a processor implements the steps of the tuning method for regression learning according to any one of claims 1 to 7.
CN202110215293.5A 2021-02-25 2021-02-25 Regression learning adjusting method, device and system and computer readable storage medium Pending CN112906799A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110215293.5A CN112906799A (en) 2021-02-25 2021-02-25 Regression learning adjusting method, device and system and computer readable storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110215293.5A CN112906799A (en) 2021-02-25 2021-02-25 Regression learning adjusting method, device and system and computer readable storage medium

Publications (1)

Publication Number Publication Date
CN112906799A true CN112906799A (en) 2021-06-04

Family

ID=76108462

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110215293.5A Pending CN112906799A (en) 2021-02-25 2021-02-25 Regression learning adjusting method, device and system and computer readable storage medium

Country Status (1)

Country Link
CN (1) CN112906799A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113642635A (en) * 2021-08-12 2021-11-12 百度在线网络技术(北京)有限公司 Model training method and device, electronic device and medium

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113642635A (en) * 2021-08-12 2021-11-12 百度在线网络技术(北京)有限公司 Model training method and device, electronic device and medium
CN113642635B (en) * 2021-08-12 2023-09-15 百度在线网络技术(北京)有限公司 Model training method and device, electronic equipment and medium

Similar Documents

Publication Publication Date Title
CN110264274B (en) Guest group dividing method, model generating method, device, equipment and storage medium
EP3441891A1 (en) Data source-based service customisation apparatus, method, system, and storage medium
EP3893169A2 (en) Method, apparatus and device for generating model and storage medium
JP2022512065A (en) Image classification model training method, image processing method and equipment
US11748452B2 (en) Method for data processing by performing different non-linear combination processing
EP2940551A1 (en) Method and device for implementing voice input
WO2023093375A1 (en) Computing resource acquisition method and apparatus, electronic device, and storage medium
CN112070545A (en) Method, apparatus, medium, and electronic device for optimizing information reach
CN112906799A (en) Regression learning adjusting method, device and system and computer readable storage medium
CN110442803A (en) Data processing method, device, medium and the calculating equipment executed by calculating equipment
CN112598244B (en) Risk profit management method, apparatus, system and computer readable storage medium
CN114692889A (en) Meta-feature training model for machine learning algorithm
CN112000803A (en) Text classification method and device, electronic equipment and computer readable storage medium
CN113612777B (en) Training method, flow classification method, device, electronic equipment and storage medium
CN113570114B (en) Resource service intelligent matching method, system and computer equipment
US10084853B2 (en) Distributed processing systems
CN113988914A (en) User value prediction method and device and electronic equipment
US20220292393A1 (en) Utilizing machine learning models to generate initiative plans
CN113887655A (en) Model chain regression prediction method, device, equipment and computer storage medium
CN113641823A (en) Text classification model training method, text classification device, text classification equipment and medium
CN111768220A (en) Method and apparatus for generating vehicle pricing models
CN111709479B (en) Image classification method and device
CN115550259B (en) Flow distribution method based on white list and related equipment
CN112529450A (en) Index analysis method, device, equipment and readable storage medium
CN114282731A (en) Risk model training method, device, equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination