CN115712810A - Support vector logistic regression method with accurate prediction factor and storage medium - Google Patents
Support vector logistic regression method with accurate prediction factor and storage medium Download PDFInfo
- Publication number
- CN115712810A CN115712810A CN202211369584.0A CN202211369584A CN115712810A CN 115712810 A CN115712810 A CN 115712810A CN 202211369584 A CN202211369584 A CN 202211369584A CN 115712810 A CN115712810 A CN 115712810A
- Authority
- CN
- China
- Prior art keywords
- support vector
- logistic regression
- regression
- fuzzy
- constructing
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Images
Landscapes
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
Abstract
The application relates to the technical field of supervised learning in data mining, and discloses a support vector logistic regression method with accurate prediction factors, a storage medium and electronic equipment, wherein the support vector logistic regression method comprises the following steps: building an input dataset x i A matrix A of (A); constructing a non-linear prediction function based on the matrix AAccording to the above-mentioned non-linear prediction functionConstructing three support vector regression models; carrying out model parameter solution on the three support vector regression models; error checking and predicting the three support vector regression modelsThe performance test can be used for processing the complicated nonlinear logistic regression problem by a simple and quick algorithm, and the fuzzy support vector logistic regression model provided by combining the accurate prediction variable and the fuzzy response has higher goodness of fit standard, reduces the influence of an isolated point on fuzzy prediction and improves the prediction precision.
Description
Technical Field
The application relates to the technical field of supervised learning in data mining, in particular to a support vector logistic regression method with an accurate prediction factor and a storage medium.
Background
Data mining is a decision support process based on statistics, machine learning, artificial intelligence and the like, and is widely applied to the fields of business management, market analysis, production control, engineering design and the like through information and knowledge acquired after data mining along with the development of cloud computing; among them, logistic Regression (LR) is frequently used as a generalized regression analysis model in the fields of data mining, automatic disease diagnosis, economic prediction, and the like. However, in most real-life scenarios, the response variable of the regression analysis is a fuzzy quantity, rather than the traditional binary variable, and the use of traditional Logistic Regression (LR) is greatly constrained.
With the continuous and deep fuzzy mathematics research, the fuzzy logistic regression model is rapidly developed in the practical application of fuzzy data. However, none of the fuzzy logistic regression models in the past can make a major breakthrough in the validity and accuracy of the results, and cannot be well applied to processing the fuzzy phenomenon in life. The method relies on the traditional linear logistic regression, and uses the traditional optimization technology, such as least square error (LMSE), least Absolute Deviation (LAD) and other algorithms to estimate the components of the model, and is easily influenced by local extremum. Meanwhile, neglecting that the observed data can be modeled by mapping a function, the function may be a problem of nonlinear combination of model parameters and predicted values.
Disclosure of Invention
The application aims to overcome the defects of the prior art and provide a support vector logistic regression method with accurate prediction factors and a storage medium.
In a first aspect, a support vector logistic regression method with precise predictors is provided, including:
constructing an input dataset x i A matrix A of (A);
constructing a non-linear prediction function based on the matrix AWherein the content of the first and second substances,andis a blurring coefficient;
according to the above-mentioned non-linear prediction functionConstructing three support vector regression models;
carrying out model parameter solution on the three support vector regression models;
and carrying out error test and prediction performance test on the three support vector regression models.
Further, the constructing the matrix a of the input data comprises the following steps:
constructing a noise-corrupted training setWherein x is i ∈R n ,x i Corresponding observed value The likelihood of success for the ith observation;
will input the value x i Arranged in a matrix A of n rows and m columns, wherein the value of the ith row is
Further, the constructing of the non-linear prediction functionThe method comprises the following steps:
In response to x ∈ R m Then, K (x, A) is added t )=(K(x,x 1 ),…,K(x,x m ) Is a row vector;
Wherein, w = (w) 1 ,…,w m ) T ,l w =(l w1 ,l w2 …,l wm ) T ,r w =(r w1 ,r w2 …,r wm ) T ;
Further, the construction of the support vector regression model comprises the following steps:
in response to K (A, A) t ) Is positive, then the above-mentioned nonlinearity is obtainedPrediction functionEquivalent to (f (x); l f(x) ;r f(x) ) T =(K(x,A t )w+b;K(x,A t )l w +l b ;K(x,A t )r w +r b ) LR To derive three support vector regression models:
v=f(x)=K(x,A t )w+b (4)
l v =l f(x) =K(x,A t )l w +l b (5)
r v =r f(x) =K(x,A t )r w +r b (6)。
further, the model parameter solution comprises the following steps:
fuzzy coefficients are optimized by adopting a three-stage algorithm;
Estimating the values of unknown coefficients and regression components of the common support vector regression model;
search through an exponential grid in the set 10 -5 ,10 -4 ,…,10 4 ,10 5 Searching regularization parameters c, c in the support vector algorithm l ,c r To obtain an improved support vector regression model.
Further, the kernel function K (x) used in the three-stage optimization algorithm is an Epanechnikov kernel function, and the expression thereof is as follows:
further, the error checking includes:
where, n, u, denotes the intersection and union on the fuzzy number space, and Card (S) denotes the number of elements in the finite set S.
Further, the predicting performance detection comprises:
fuzzy response to fuzzy by Sugeno fuzzy modelAnd a scatter plot based fuzzy response estimateDefuzzification of the relationship between them to obtain the corresponding accurate valueAnd
In a second aspect, the computer readable medium stores program code for execution by a device, the program code comprising instructions for performing the steps of the method according to any one of the implementations of the first aspect.
In a third aspect, an electronic device is characterized in that the electronic device includes a processor, a memory, and a program or an instruction stored on the memory and executable on the processor, and when executed by the processor, the program or the instruction implements the steps of the method according to any one of the implementations of the first aspect.
The application has the following beneficial effects: the support vector logistic regression method based on the kernel function can process complex nonlinear logistic regression problems through a simple and rapid algorithm, combines an accurate prediction variable and a fuzzy support vector logistic regression model provided by fuzzy response, has a relatively high goodness-of-fit standard, reduces the influence of an isolated point on fuzzy prediction, and improves the prediction precision by measuring the classification confidence coefficient.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this application, are included to provide a further understanding of the application, and are incorporated in and constitute a part of this application for purposes of illustration and description.
In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.
FIG. 1 is a flowchart of a support vector logistic regression method with accurate predictors according to a first embodiment of the present application;
FIG. 2 is a statistical chart of 100 subjects listed in the support vector logistic regression method with accurate predictors according to the first embodiment of the present application;
FIG. 3 is a diagram of a fuzzy language term set in a support vector logistic regression method with accurate predictors according to a first embodiment of the present application;
FIG. 4 is a diagram illustrating the detection of outliers in the SVM regression with accurate predictors method according to the first embodiment of the present applicationThe scatter plot of (a);
FIG. 5 is a schematic diagram of a support vector logistic regression method with accurate predictors according to a first embodiment of the present applicationAndcomparison plots of values with other fuzzy regression methods;
FIG. 6 is a fuzzy language term set and its corresponding statistical chart in the support vector logistic regression method with accurate predictor according to the first embodiment of the present application;
FIG. 7 is a graph of the estimated fuzzy coefficients of the model and their performance metrics corresponding to the proposed method and some common fuzzy logistic regression models.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
Example one
The embodiment of the application relates to a method for predicting a factor with accuracyA support vector logistic regression method comprising: building an input dataset x i A matrix A of (A); constructing a non-linear prediction function based on the matrix AAccording to the above non-linear prediction functionConstructing three support vector regression models; carrying out model parameter solution on the three support vector regression models; and carrying out error test and prediction performance test on the three support vector regression models.
Specifically, fig. 1 shows a flowchart of a support vector logistic regression method with accurate predictors in the first application embodiment, which specifically includes the following steps:
s101, constructing an input data set x i A matrix A of (A);
specifically, the constructing of the matrix a of the input data includes the following steps:
constructing a noise-corrupted training setWherein x is i ∈R n ,x i Corresponding observed value The likelihood of success for the ith observation;
will input the value x i Arranged in a matrix A of n rows and m columns, wherein the value of the ith row is
Illustratively, the age (in years) of 100 subjects and the presence or absence of evidence of a potential purchase are listed in FIG. 2, which also contains an identification variable ID and an age group variable x 2 Obtaining corresponding fuzzy response observation vector by calculationThereby obtaining a noise-corrupted training setWherein for each input value x i ∈R n Its corresponding observed valueResult variableFor "purchase", encoding is performed according to some linguistic terms, such as Very Low (VL), low (SL), low (L), slightly low (ALL), medium (M), slightly high (ALH), high (H), some High (SH), and Very High (VH), as shown in FIG. 2, with membership functions as shown in FIG. 3, and a set of ambiguous linguistic terms and their corresponding onesAs shown in fig. 6.
Illustratively, to simplify the representation and processing of fuzzy numbers, an LR-type fuzzy number is used hereThe parametric form of the function of (2) is defined as follows:
wherein a is ∈ R, l a (>0) And r a (<0) Are respectively calledIs a mean, left-right spread, function L (or R) at R + →[0,1]L (0) =1,L (1) =0, L (x) monotonically decreases, and further, inaccuracy in the data set is dealt with by the most common triangular blur number in the LR model, i.e., L (x) = R (x) = max {0,1, -x }, so L (0) = 5363 (1) =0, L (x) monotonically decreases, and further, inaccuracies in the data set are dealt with, therebyCan be further expressed as follows:
specifically, a kernel function K (x) is introduced, and an m-order kernel matrix K (A, A) is defined t ) So that(K(A,A t )) ij =K(x i ,x j ) Then for any x ∈ R m All have K (x, A) t )=(K(x,x 1 ),…,K(x,x m ) Is a row vector, therefore, a non-linear prediction function is assumed Wherein the content of the first and second substances,andblur coefficients (unknown):
wherein, w = (w) 1 ,…,w m ) T ,l w =(l w1 ,l w2 …,l wm ) T ,r w =(r w1 ,r w2 …,r wm ) T ;
S103, according to the nonlinear prediction functionConstructing three support vector regression models;
in particular, since the kernel matrix K (A, A) t ) If the number is positive, the above-mentioned non-linear prediction functionEquivalent to (f (x); l f(x) ;r f(x) ) T =(K(x,A t )w+b;K(x,A t )l w +l b ;K(x,A t )r w +r b ) LR Therefore, the following three support vectors can be deducedClassification (SVR) model:
v=f(x)=K(x,A t )w+b (4)
l v =l f(x) =K(x,A t )l w +l b (5)
r v =r f(x) =K(x,A t )r w +r b (6)
thus, there is a set of training setsThe unknown blurring coefficients can be found by the following three-stage methodAnd
s104, solving model parameters of the three support vector regression models;
specifically, the unknown blurring coefficient is obtainedAndand (3) carrying out three-stage optimization algorithm evaluation on the fuzzy coefficient, namely analyzing the corresponding fuzzy response estimated value based on the scatter diagram by using a support vector method (applying Epanechnikov kernel function) on the three obtained Support Vector Regression (SVR) models
Wherein, C l ,C r >0 is a regularization constant, L H Represents the absolute error loss function, defined as follows:
the Support Vector Machine (SVM) can minimize the structural risk through a three-stage optimization algorithm to generalize the limitation of the sum of training errors so as to approximate the functional relationship in regression analysis. Compared with a traditional nonlinear regression model, support Vector Regression (SVR) can combine classifiers trained on different types of data by applying probability rules on the robustness of abnormal values; and the prediction precision is improved by measuring the classification confidence.
Estimating unknown coefficients and regression components of a common SVR model with mathematical software and searching in the set 10 using an exponential grid -5 ,10 -4 ,…,10 4 ,10 5 Searching regularization parameters c, c in the support vector algorithm l ,c r To obtain an improved Support Vector Regression (SVR) model, the results are shown in table 1.
Table 1:
the kernel function K (x) used in the three-stage optimization algorithm is an Epanechnikov kernel function, and the expression of the kernel function K (x) is as follows:
and S105, carrying out error test and prediction performance test on the three support vector regression models.
Specifically, the error checking includes:
where —, u, denotes the intersection and union on the Fuzzy Number (FN) space, the number of elements in the Card (S) finite set S, fig. 7 summarizes the values of the estimated fuzzy coefficients associated with each kernel and their goodness-of-fit criteria; in addition, the estimated fuzzy coefficients and performance metrics associated with some common fuzzy logistic regression models (Gao and Lu, pourahmad et al, and Namdari et al) are listed in FIG. 2.
The predictive performance detection includes:
responding to blur by Sugeno blur modelAnd a scatter plot based fuzzy response estimateDefuzzification of the relationship between them to obtain the corresponding accurate valueAndwherein:
when in useAndthe closer the values of the two are, the higher the prediction performance of the model is, and an expanded Cock distance criterion is adopted to check outliers, so that the outliers are usedIf the accuracy of the results is higher, indicating that FLOGSVR provides more accurate results, the method may enter a method implementation stage, for which the data set contains some potential outliers, as observed in fig. 4, and compare the method with other fuzzy regression models, as shown in fig. 5 (where fig. 5 includes fig. 5a, fig. 5b, fig. 5c, and fig. 5 d), it may be concluded that: the values of M estimated with the proposed algorithm are closer to their respective estimated values, and these figures also show that the performance of the proposed fuzzy logistic regression model is better than other models.
The Support Vector Regression (SVR) model with accurate prediction variables and fuzzy response is introduced into a nonlinear (based on a common kernel function) logistic regression model, the support vector logistic regression method with the accurate prediction factors is provided, the support vector logistic regression method based on the kernel function can process complex nonlinear logistic regression problems through a simple and rapid algorithm, the fuzzy support vector logistic regression model provided by combining the accurate prediction variables and the fuzzy response has a relatively high goodness of fit standard, the influence of isolated points on fuzzy prediction is reduced, and the prediction precision is improved by measuring the classification confidence.
Example two
A computer-readable storage medium according to a third embodiment of the present application, storing program code for execution by a device, the program code including instructions for performing a method according to any one of the first to third embodiments of the present application;
the computer readable storage medium may be a Read Only Memory (ROM), a static storage device, a dynamic storage device, or a Random Access Memory (RAM); the computer readable storage medium may store program code for performing the steps of the method as in any one of the implementations of the embodiment of the present application when the program stored in the computer readable storage medium is executed by the processor.
EXAMPLE III
A chip related to a fourth embodiment of the present application, where the chip includes a processor and a data interface, and the processor reads an instruction stored in a memory through the data interface to execute the steps of the method in any one implementation manner in the first embodiment of the present application;
the processor may adopt a general Central Processing Unit (CPU), a microprocessor, an Application Specific Integrated Circuit (ASIC), a Graphics Processing Unit (GPU), or one or more integrated circuits, and is configured to execute a related program, so as to implement the method in any implementation manner in the first embodiment of the present application.
The processor may also be an integrated circuit chip having signal processing capabilities. In implementation, the steps of the method in any one implementation of the first embodiment of the present application may be implemented by hardware integrated logic circuits in a processor or instructions in the form of software.
The processor may also be a general purpose processor, a digital signal processor, an Application Specific Integrated Circuit (ASIC), an FPGA (field programmable gate array) or other programmable logic device, discrete gate or transistor logic, discrete hardware components. The various methods, steps, and logic blocks disclosed in the embodiments of the present application may be implemented or performed. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like. The steps of the method disclosed in connection with the embodiments of the present application may be directly implemented by a hardware decoding processor, or implemented by a combination of hardware and software modules in the decoding processor. The software module may be located in ram, flash memory, rom, prom, or eprom, registers, etc. storage media as is well known in the art. The storage medium is located in a memory, and a processor reads information in the memory, and performs, in combination with hardware of the storage medium, functions required to be performed by a unit included in the data processing apparatus according to the embodiment of the present application, or performs a method according to any one implementation manner of the embodiment of the present application.
The above are only preferred embodiments of the present application; the scope of protection of the present application is not limited thereto. Any person skilled in the art should be able to cover all equivalent or changes within the technical scope of the present disclosure, which is equivalent to the technical solution and the improvement concept of the present disclosure, and the protection scope of the present disclosure.
Claims (10)
1. A method of support vector logistic regression with accurate predictors, comprising:
building an input dataset x i A matrix A of (A);
constructing a non-linear prediction function based on the matrix AWherein the content of the first and second substances,andis a blurring coefficient;
according to the above-mentioned non-linear prediction functionConstructing three support vector regression models;
carrying out model parameter solution on the three support vector regression models;
and carrying out error test and prediction performance test on the three support vector regression models.
2. The method of support vector logistic regression with accurate predictors according to claim 1, wherein said constructing a matrix a of input data comprises the steps of:
constructing a noise-corrupted training setWherein x is i ∈R n ,x i Corresponding observed value The likelihood of success for the ith observation;
will input the value x i Arranged in a matrix A of n rows and m columns, wherein the value of the ith row is
3. The method of claim 1, wherein the constructing a non-linear prediction functionThe method comprises the following steps:
In response to x ∈ R m Then, K (x, A) is added t )=(K(x,x 1 ),…,K(x,x m ) Is a row vector;
Wherein, w = (w) 1 ,…,w m ) T ,l w =(l w1 ,l w2 …,l wm ) T ,r w =(r w1 ,r w2 …,r wm ) T ;
4. The support vector logistic regression method with accurate predictor according to claim 1, wherein the construction of the support vector regression model comprises the following steps:
in response to K (A, A) t ) If the number is positive, the above-mentioned non-linear prediction functionEquivalent to (f (x); l f(x) ;r f(x) ) T =(K(x,A t )w+b;K(x,A t )l w +l b ;K(x,A t )r w +r b ) LR To derive three support vector regression models:
v=f(x)=K(x,A t )w+b (4)
l v =l f(x) =K(x,A t )l w +l b (5)
r v =r f(x) =K(x,A t )r w +r b (6)。
5. the method of support vector logistic regression with accurate predictors according to claim 1, wherein said model parameter solution comprises the steps of:
fuzzy coefficients are optimized by adopting a three-stage algorithm;
Estimating the values of unknown coefficients and regression components of the common support vector regression model;
search through an exponential grid in the set 10 -5 ,10 -4 ,…,10 4 ,10 5 Searching regularization parameters c, c in the support vector algorithm l ,c r To obtain an improved support vector regression model.
7. the method of support vector logistic regression with accurate predictors according to any of claims 1-6, wherein said error checking comprises:
where, n, U, denotes the intersection and union over the fuzzy number space, and Card (S) denotes the number of elements in the finite set S.
8. The method of support vector logistic regression with accurate predictors according to any of claims 1-6, wherein said prediction performance detection comprises:
responding to blur by Sugeno blur modelAnd a scatter plot based fuzzy response estimateDefuzzification of the relationship between them to obtain the corresponding accurate valueAnd
9. A computer-readable storage medium, characterized in that the computer-readable medium stores program code for execution by a device, the program code comprising steps for performing the method according to any one of claims 1-8.
10. An electronic device comprising a processor, a memory, and a program or instructions stored on the memory and executable on the processor, the program or instructions when executed by the processor implementing the steps of the method according to any one of claims 1-8.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202211369584.0A CN115712810A (en) | 2022-11-03 | 2022-11-03 | Support vector logistic regression method with accurate prediction factor and storage medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202211369584.0A CN115712810A (en) | 2022-11-03 | 2022-11-03 | Support vector logistic regression method with accurate prediction factor and storage medium |
Publications (1)
Publication Number | Publication Date |
---|---|
CN115712810A true CN115712810A (en) | 2023-02-24 |
Family
ID=85232058
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202211369584.0A Pending CN115712810A (en) | 2022-11-03 | 2022-11-03 | Support vector logistic regression method with accurate prediction factor and storage medium |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN115712810A (en) |
-
2022
- 2022-11-03 CN CN202211369584.0A patent/CN115712810A/en active Pending
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Lughofer | On-line active learning: A new paradigm to improve practical useability of data stream modeling methods | |
Zhang et al. | Training set debugging using trusted items | |
Dejaeger et al. | Data mining techniques for software effort estimation: a comparative study | |
US20180137338A1 (en) | System and method for classifying and segmenting microscopy images with deep multiple instance learning | |
WO2020049094A1 (en) | Computer-implemented method, computer program product and system for data analysis | |
Mittas et al. | LSEbA: least squares regression and estimation by analogy in a semi-parametric model for software cost estimation | |
CN110059894B (en) | Equipment state evaluation method, device, system and storage medium | |
CN110705718A (en) | Model interpretation method and device based on cooperative game and electronic equipment | |
US11514369B2 (en) | Systems and methods for machine learning model interpretation | |
CN111127364A (en) | Image data enhancement strategy selection method and face recognition image data enhancement method | |
EP4273754A1 (en) | Neural network training method and related device | |
CN115545300B (en) | Method and device for predicting user behavior based on graph neural network | |
CN112420125A (en) | Molecular attribute prediction method and device, intelligent equipment and terminal | |
US20200118012A1 (en) | Monitoring the Thermal Health of an Electronic Device | |
Elshewey et al. | Forest fires detection using machine learning techniques | |
CN111325344A (en) | Method and apparatus for evaluating model interpretation tools | |
CN114187009A (en) | Feature interpretation method, device, equipment and medium of transaction risk prediction model | |
WO2024078112A1 (en) | Method for intelligent recognition of ship outfitting items, and computer device | |
CN116665798A (en) | Air pollution trend early warning method and related device | |
Zhang et al. | Cost-sensitive Naïve Bayes Classification of Uncertain Data. | |
CN111611796A (en) | Hypernym determination method and device for hyponym, electronic device and storage medium | |
CN110851600A (en) | Text data processing method and device based on deep learning | |
CN115712810A (en) | Support vector logistic regression method with accurate prediction factor and storage medium | |
CN112861689A (en) | Searching method and device of coordinate recognition model based on NAS technology | |
Clémençon et al. | Building confidence regions for the ROC surface |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |