CN113090420A

CN113090420A - EGR flow diagnosis method based on multi-classification logistic regression algorithm

Info

Publication number: CN113090420A
Application number: CN202110325668.3A
Authority: CN
Inventors: 董定欢; 陈玉俊; 张衡; 周杰敏; 蒋学锋; 朱丹丹; 袁集平; 张恒平; 刘乔华; 张亚晓
Original assignee: Dongfeng Commercial Vehicle Co Ltd
Current assignee: Dongfeng Commercial Vehicle Co Ltd
Priority date: 2021-03-26
Filing date: 2021-03-26
Publication date: 2021-07-09
Anticipated expiration: 2041-03-26
Also published as: CN113090420B

Abstract

The invention relates to the technical field of design of engine control units, in particular to an EGR flow diagnosis method based on a multi-classification logistic regression algorithm, which comprises the following steps: establishing an initial prediction model related to the EGR characteristic weight and the EGR characteristic quantity, and establishing a cost function related to the initial prediction model based on the initial prediction model; bringing multiple groups of sample data of multiple EGR characteristic quantities and corresponding sample results into a cost function, calculating an optimal solution of the cost function, and obtaining a diagnostic EGR characteristic weight value corresponding to the optimal solution; substituting the diagnostic EGR characteristic weight value into an initial prediction model to obtain a diagnostic EGR prediction model; and substituting the collected data of the EGR characteristic quantity into a diagnosis and prediction model to obtain a prediction result of the EGR flow. The problem that in the prior art, calculation is carried out according to a NOx emission set value and a NOx emission measured value, data quantity of bench test data and whole vehicle development test data is limited, and therefore errors generated by a fault judging model are large can be solved.

Description

EGR flow diagnosis method based on multi-classification logistic regression algorithm

Technical Field

The invention relates to the technical field of design of engine control units, in particular to an EGR flow diagnosis method based on a multi-classification logistic regression algorithm.

Background

EGR is an abbreviation for Exhaust Gas Re-circulation, i.e., Exhaust Gas recirculation. Exhaust gas recirculation refers to the recirculation of a portion of the exhaust gases from the engine back into the intake manifold and back into the cylinders along with fresh mixture. Since exhaust gas contains a large amount of polyatomic gas such as CO2, and gas such as CO2 cannot be combusted but absorbs a large amount of heat due to its high specific heat capacity, the maximum combustion temperature of the air-fuel mixture in the cylinder is lowered, and the amount of NOx generated is reduced.

The Chinese patent application 'EGR flow fault judgment method, device and equipment' (application number: CN201911364095.4) discloses an EGR flow fault diagnosis method, which comprises the following steps: determining a correction coefficient according to a NOx emission set value and a NOx emission measured value of a target engine; obtaining a target deviation threshold according to the correction coefficient and a preset deviation threshold; determining a deviation of a flow parameter of the target engine based on the measured flow parameter and a set amount of the flow parameter, wherein the flow parameter includes at least one of an intake air amount and an exhaust gas amount; and judging whether the EGR of the target engine has flow faults or not according to the deviation of the flow parameters and the target deviation threshold value.

However, prior art devices require calculations based on NOx emission setpoints and NOx emission measurements that require bench testing to obtain accurate data. The engine emissions are affected by a variety of factors, and emissions on a bench and emissions on the whole vehicle are changed due to changes in the pipes installed in the intake and exhaust systems, the ambient temperature and humidity, and the like. Secondly, the data volume of the bench test data and the whole vehicle development test data is very limited, so that the error generated by a fault judging model is relatively large.

Disclosure of Invention

Aiming at the defects in the prior art, the invention aims to provide an EGR flow diagnosis method based on a multi-classification logistic regression algorithm, which can solve the problem that in the prior art, calculation is carried out according to a NOx emission set value and a NOx emission measured value, wherein the NOx emission measured value can only obtain accurate data through a bench test, and the data quantity of the bench test data and the data quantity of the whole vehicle development test data are both limited, so that the error generated by a fault judging model is relatively large.

In order to achieve the above purposes, the technical scheme adopted by the invention is as follows:

the invention provides an EGR flow diagnosis method based on a multi-classification logistic regression algorithm, which comprises the following steps of:

establishing an initial prediction model related to the EGR characteristic weight and the EGR characteristic quantity, and establishing a cost function related to the initial prediction model based on the initial prediction model;

bringing multiple groups of sample data of multiple EGR characteristic quantities and corresponding sample results into a cost function, calculating an optimal solution of the cost function, and obtaining a diagnostic EGR characteristic weight value corresponding to the optimal solution;

substituting the diagnostic EGR characteristic weight value into an initial prediction model to obtain a diagnostic EGR prediction model;

and substituting the collected data of the EGR characteristic quantity into a diagnosis and prediction model to obtain a prediction result of the EGR flow.

In some optional embodiments, the establishing an initial prediction model about the EGR characteristic weight and the EGR characteristic quantity specifically includes:

and establishing an initial prediction model h (X) ═ g (theta X X) based on the EGR characteristic weight and the EGR characteristic quantity by adopting a logistic regression algorithm, wherein g is a Sigmoid function, theta is the EGR characteristic weight, and X is the EGR characteristic quantity.

In some optional embodiments, the establishing a cost function related to the initial prediction model based on the initial prediction model specifically includes:

according to the maximum likelihood estimation method, a cost function related to the initial prediction model is established based on the initial prediction model to be cost (h (x)), wherein y is (1/m) (∑ [ -y × (h (x)) - (1-y) lg (1-h (x)) ], wherein cost (h (x)), y) is the cost function of the initial prediction model, m is the group number of the EGR characteristic quantities, and y is the sample result.

In some optional embodiments, the bringing multiple sets of sample data of multiple EGR characteristic quantities and corresponding sample results into a cost function, calculating an optimal solution of the cost function, and obtaining a diagnostic EGR characteristic weight value corresponding to the optimal solution specifically includes:

bringing multiple groups of sample data of multiple EGR characteristic quantities and corresponding sample results into a cost function;

setting an arbitrary initial value for the EGR characteristic weight to substitute the initial value into the cost function, and solving the optimal solution of the cost function by adopting a gradient descent method, wherein the EGR characteristic weight corresponding to the optimal solution is determined as a diagnostic EGR characteristic weight value.

In some optional embodiments, the solving the optimal solution of the cost function by using a gradient descent method specifically includes:

according to the formula

Loop iteration when theta_t+1The corresponding cost function is less than or equal to theta_tObtaining the optimal solution of the cost function when the corresponding cost function is used, wherein alpha is the learning rate, and theta is_t+1Is the value of the EGR characteristic weight at the t +1 th iteration, θ_tIs the value of the EGR characteristic weight at the t-th iteration, θ₀Is an arbitrary initial value.

In some optional embodiments, when the cost function is brought into multiple groups of sample data and corresponding sample results of multiple EGR characteristic quantities with low EGR flow, a diagnostic prediction model with low EGR flow is obtained;

and when the brought cost function is a plurality of groups of sample data and corresponding sample results of a plurality of EGR characteristic quantities with high EGR flow, obtaining a diagnosis and prediction model with high EGR flow.

In some optional embodiments, the substituting the collected data of the EGR characteristic quantity into the diagnostic prediction model to obtain the prediction result of the EGR flow specifically includes:

bringing the collected data of the EGR characteristic quantity into a diagnosis prediction model with low EGR flow, and when the probability value of the diagnosis prediction model is more than or equal to 50%, determining that the EGR flow is low; when the predicted probability value is less than 50%, the EGR flow is normal or the EGR flow is high;

the collected data of the EGR characteristic quantity is brought into a diagnosis and prediction model with high EGR flow, and when the probability value of the diagnosis and prediction model is more than or equal to 50%, the EGR flow is high; when the predicted probability value is less than 50%, the EGR flow rate is normal or the EGR flow rate is low.

In some optional embodiments, the same collected data of the EGR characteristic quantity is simultaneously brought into a diagnosis prediction model with low EGR flow and a diagnosis prediction model with high EGR flow, so as to obtain prediction probability values of the diagnosis prediction models with low EGR flow and high EGR flow respectively, and a diagnosis result corresponding to the larger of the prediction probability values is a final prediction result.

In some optional embodiments, the EGR characteristic quantity includes at least three parameters of an EGR valve opening degree, an EGR valve inlet temperature, an EGR valve inlet pressure, an engine speed, an engine torque, an intake air flow rate, and an exhaust gas flow rate.

In some optional embodiments, the EGR characteristic quantities are a matrix of m × n, m being the number of groups of EGR characteristic quantities, and n being the number of EGR characteristic quantities.

Compared with the prior art, the invention has the advantages that: establishing an initial prediction model related to the EGR characteristic weight and the EGR characteristic quantity, and establishing a cost function related to the initial prediction model based on the initial prediction model; bringing multiple groups of sample data of multiple EGR characteristic quantities and corresponding sample results into a cost function, calculating an optimal solution of the cost function, and obtaining a diagnostic EGR characteristic weight value corresponding to the optimal solution; substituting the diagnostic EGR characteristic weight value into an initial prediction model to obtain a diagnostic EGR prediction model; and substituting the collected data of the EGR characteristic quantity into a diagnosis and prediction model to obtain a prediction result of the EGR flow. According to the scheme, the characteristics of large, real and various data quantity of the collected EGR characteristic quantity are utilized, a more accurate diagnosis and prediction model is established, and the relevant characteristic quantity is extracted by analyzing the influence factors of the EGR flow. And the optimal solution of the EGR characteristic weight in the initial prediction model is obtained through the cost function, so that a more accurate diagnosis prediction model is obtained, and the defect that the traditional model development is limited by test conditions and cannot truly reflect the application environment is overcome.

In addition, the vectorization processing is carried out on a plurality of groups of sample data of the EGR characteristic quantities and the collected data of the EGR characteristic quantities, so that software can directly carry out operation by using a matrix, the efficiency of the matrix operation is higher than that of a code entering a cycle after single-value processing, and the system resources of a computing unit are saved.

Drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.

FIG. 1 is a flow chart of a method for diagnosing EGR flow based on a multi-classification logistic regression algorithm in an embodiment of the present invention;

fig. 2 is a flowchart of step S2 in the EGR flow rate diagnosis method based on the multi-classification logistic regression algorithm according to the embodiment of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the embodiments of the present application clearer, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are some embodiments of the present application, but not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.

Embodiments of the present invention will be described in further detail below with reference to the accompanying drawings.

Fig. 1 is a flowchart of an EGR flow rate diagnosis method based on a multiple-classification logistic regression algorithm in an embodiment of the present invention, and as shown in fig. 1, the present invention provides an EGR flow rate diagnosis method based on a multiple-classification logistic regression algorithm, including the following steps:

s1: and establishing an initial prediction model of the EGR characteristic weight and the EGR characteristic quantity, and establishing a cost function of the initial prediction model based on the initial prediction model.

In some optional embodiments, the EGR characteristic quantity includes at least three parameters of an EGR valve opening, an EGR valve inlet temperature, an EGR valve inlet pressure, an engine speed, an engine torque, an intake air flow rate, and an exhaust gas flow rate. In the embodiment, the EGR characteristic quantity is a parameter related to the EGR flow, and the acquisition of more parameters can make the result predicted by the established initial prediction model more valuable and more accurate. In this example, to obtain an accurate initial prediction model, the EGR characteristic quantities include all of the parameters of EGR valve opening, EGR valve inlet temperature, EGR valve inlet pressure, engine speed, engine torque, intake air flow, and exhaust gas flow. In other embodiments, more parameters related to EGR flow may be obtained.

In some optional embodiments, establishing an initial prediction model for the EGR characteristic weight and the EGR characteristic quantity specifically includes:

When the prediction model is used for prediction by adopting a quantitative prediction method, the most important work is to establish a prediction mathematical model. A predictive model refers to the quantitative relationship between things described in a mathematical language or formula for prediction. The method reveals the internal regularity of objects to a certain extent, and takes the internal regularity as a direct basis for calculating a predicted value in prediction. Therefore, it has a great influence on the prediction accuracy. In this example, the number of EGR characteristic amounts in the initial prediction model has a great influence on the accuracy of the initial prediction model.

Sigmoid function is a common biological Sigmoid function, also called sigmoidal growth curve. In the information science, due to the properties of single increment, single increment of an inverse function and the like, a Sigmoid function is often used as an activation function of a neural network, a variable is mapped between 0 and 1, and the obtained probability value is between 0 and 100 percent.

Logistic regression, also known as logistic regression analysis, is a generalized linear regression analysis model, and is commonly used in the fields of data mining, automatic disease diagnosis, economic prediction and the like. Logistic regression is inherently a two-class problem. The two-classification problem is that the predicted y value is only two values (0 or 1), and the two-classification problem can be extended to a multi-classification problem. For example: we want to make a spam filtering system, where x is the feature of the mail and the predicted y value is the mail category, whether it is spam or normal. For classes we commonly refer to positive classes (positive classes) and negative classes (negative classes), in the example of spam, positive classes are normal mail and negative classes are spam.

In some alternative embodiments, the EGR characteristic quantities are a matrix of m × n, m being the number of groups of EGR characteristic quantities, and n being the number of EGR characteristic quantities.

In the embodiment, the EGR characteristic quantity is a matrix of m × n, and a more accurate diagnosis and prediction model can be established by using the characteristics of large, real and various data quantity of the collected EGR characteristic quantity, so that the defect that the traditional model development is limited by test conditions and cannot truly reflect an application environment is overcome. In addition, feature quantity extraction and vectorization processing are carried out on the collected big data. By analyzing the influence factors of the EGR flow, the relevant characteristic quantity is extracted. The data vectorization processing enables software to directly use a matrix for operation, the efficiency of the matrix operation is higher than that of a code entering a cycle after single-value processing, and system resources of a computing unit are saved.

In some optional embodiments, establishing a cost function with respect to the initial prediction model based on the initial prediction model specifically includes:

according to the maximum likelihood estimation method, a cost function related to an initial prediction model is established based on the initial prediction model to be cost (h (x)), wherein y is (1/m) (∑ [ -y × (h (x)) - (1-y) lg (1-h (x)) ], wherein cost (h (x), y) is the cost function of the initial prediction model, m is the group number of the EGR characteristic quantities, and lg is a logarithm with the base 10.

The cost function is a function that maps the value of a random event or its associated random variable to a non-negative real number to represent the "risk" or "loss" of the random event. In application, the cost function is usually associated with the optimization problem as a learning criterion, i.e. the model is solved and evaluated by minimizing the loss function.

The cost function value represents the difference between the prediction model and the real sample result, so it is desirable to find the cost function as the minimum. When the cost function is the minimum value, the weight corresponding to the characteristic quantity is obtained by adopting a gradient descent method.

The maximum likelihood estimation is just one of the statistical applications of probability theory, which is one of the methods for parameter estimation. It is known that a random sample satisfies a certain probability distribution, but the specific parameters are not clear, and the parameter estimation is to observe the results through several experiments, and use the results to derive the approximate values of the parameters. The maximum likelihood estimation is based on the idea that: knowing that a certain parameter can maximize the probability of the sample appearing, we can certainly not select other samples with small probability, so that the parameter is simply used as the estimated true value.

S2: and bringing multiple groups of sample data of the multiple EGR characteristic quantities and corresponding sample results into a cost function, calculating an optimal solution of the cost function, and obtaining a diagnostic EGR characteristic weight value corresponding to the optimal solution.

Fig. 2 is a flowchart of step S2 in the EGR flow rate diagnosis method based on the multi-classification logistic regression algorithm according to the embodiment of the present invention. As shown in fig. 2, bringing multiple sets of sample data of multiple EGR characteristic quantities and corresponding sample results into a cost function, calculating an optimal solution of the cost function, and obtaining a diagnostic EGR characteristic weight value corresponding to the optimal solution, specifically includes:

s21: and bringing multiple groups of sample data of multiple EGR characteristic quantities and corresponding sample results into a cost function.

In this embodiment, first, sample data is substituted into the initial prediction model, and the initial prediction model and the sample result substituted into the sample data are substituted into the cost function to solve, and a set of sample data of the EGR characteristic amount is listed below, as shown in table 1:

TABLE 1 sample data Table for EGR characteristic quantities

EGR valve opening sample 1	EGR valve inlet temperature sample 1	......	Feature quantity n sample 1
				EGR valve opening sample 2	EGR valve inlet temperature sample 2	......	Feature quantity n sample 2
......	......	......	......
				EGR valve opening sample m	EGR valve inlet temperature sample m	......	Feature n sample m

That is, a matrix of the EGR characteristic amount m × n is substituted into the cost function cost (h (x), y) ((1/m) (∑ y [ -lg (h (x)) - (1-y) lg (1-h (x)) ], m is the number of groups of the EGR characteristic amount, and n is the number of the EGR characteristic amount.

In some optional embodiments, when the cost function is brought into multiple groups of sample data and corresponding sample results of multiple EGR characteristic quantities with low EGR flow, a diagnostic prediction model with low EGR flow is obtained; and when the brought cost function is a plurality of groups of sample data and corresponding sample results of a plurality of EGR characteristic quantities with high EGR flow, obtaining a diagnosis and prediction model with high EGR flow.

Where y is the sample result, the value of the sample result is 0 or 1.

When the diagnostic prediction model for solving low EGR flow is used, 0 represents that the EGR flow is normal or that the EGR flow is high, i.e., the EGR flow is not low, and 1 represents that the EGR flow is low. And bringing sample data and sample results under the low working condition of the known EGR flow into the cost function.

When the diagnostic prediction model for solving high EGR flow is used, 0 represents that the EGR flow is normal or the EGR flow is low, i.e., the EGR flow is not high, and 1 represents that the EGR flow is high. And bringing sample data and sample results under the condition of known high EGR flow into the cost function.

S22: setting an arbitrary initial value for the EGR characteristic weight to substitute the initial value into the cost function, and solving the optimal solution of the cost function by adopting a gradient descent method, wherein the EGR characteristic weight corresponding to the optimal solution is determined as a diagnostic EGR characteristic weight value.

In some optional embodiments, solving the optimal solution of the cost function by using a gradient descent method specifically includes:

according to the formula

In this embodiment, the EGR characteristic weight corresponding to the optimal solution of the solution cost function is determined as the diagnostic EGR characteristic weight.

The gradient descent method is one of iterative methods, and can be used to solve a least squares problem (both linear and non-linear). Gradient Descent (Gradient decision) is one of the most commonly used methods when solving model parameters of a machine learning algorithm, i.e., an unconstrained optimization problem, and the other commonly used method is the least squares method. When the minimum value of the loss function is solved, iterative solution can be carried out step by step through a gradient descent method, and the minimized loss function and the model parameter value are obtained.

The principle of solving the optimal solution of the cost function by adopting the gradient descent method is as follows:

the cost function value represents the difference between the initial prediction model and the real sample result, so it is desirable to find the cost function as the minimum. When the cost function is the minimum value, the weight corresponding to the EGR characteristic quantity is obtained by adopting a gradient descent method.

According to the above formula

The EGR characteristic weight θ is set to an arbitrary initial value, and is substituted into a cost function cost (h (x), y ═ 1/m (∑ y [. lg (h (x)) - (1-y) lg (1-h (x))]Determining cost function cost (h (x), y) (1/m) (∑ y [. lg (h) (x)) - (1-y) lg (1-h (x))]The current gradient is then calculated with respect to the partial derivative of the EGR characteristic weight θ.

And subtracting the product of the current gradient value and the learning rate on the basis of the original EGR characteristic weight theta, and then judging whether the cost function value is converged.

And if the cost function value has a convergence trend, continuously updating the EGR characteristic weight theta according to the formula, and continuously repeating until the gradient is close to zero. The learning rate controls how fast θ changes each time it is calculated according to the formula. If the learning rate alpha value is too large, the change of theta is also large, and the cost function is possibly not converged; if the learning rate α is too small, the θ changes little, resulting in an increase in the time to reach the optimal solution.

The learning rate a is therefore typically solved starting with an empirical value and trying different values will achieve a convergence of the cost function value and calculate a learning rate that is acceptable in time as the final value.

And finally, solving a local optimal solution of the cost function, wherein the value of the EGR characteristic weight is not changed any more, in the example, the minimum value of the local optimal solution of the cost function.

In this example, the result prediction of the second classification can be performed on new input data by obtaining an EGR characteristic weight with a low EGR flow rate from sample data with a low EGR flow rate and a sample result, and obtaining an EGR characteristic weight with a high EGR flow rate from sample data with a high EGR flow rate and a sample result, respectively, and substituting the EGR characteristic weights into a prediction model.

S3: and substituting the diagnostic EGR characteristic weight value into the initial prediction model to obtain a diagnostic EGR prediction model.

In this embodiment, when solving the diagnostic prediction model with low EGR flow, sample data and sample results under the known low EGR flow operating condition are brought into the cost function to obtain a diagnostic EGR characteristic weight value corresponding to low EGR flow, and the diagnostic EGR characteristic weight value corresponding to low EGR flow is brought into the initial prediction model to obtain the diagnostic EGR prediction model with low EGR flow.

When the diagnosis prediction model with high EGR flow is solved, sample data and sample results under the known conditions with high EGR flow are brought into the cost function to obtain a diagnosis EGR characteristic weight value corresponding to the high EGR flow, and the diagnosis EGR characteristic weight value corresponding to the high EGR flow is brought into the initial prediction model to obtain the diagnosis EGR prediction model with the high EGR flow.

S4: and substituting the collected data of the EGR characteristic quantity into a diagnosis and prediction model to obtain a prediction result of the EGR flow.

In some optional embodiments, the bringing the collected data of the EGR characteristic quantity into the diagnosis prediction model to obtain the prediction result of the EGR flow rate includes:

bringing the collected data of the EGR characteristic quantity into a diagnosis prediction model with low EGR flow, and when the probability value of the diagnosis prediction model is more than or equal to 50%, determining that the EGR flow is low; when the predicted probability value is less than 50%, the EGR flow is normal or the EGR flow is high. The collected data of the EGR characteristic quantity is brought into a diagnosis and prediction model with high EGR flow, and when the probability value of the diagnosis and prediction model is more than or equal to 50%, the EGR flow is high; when the predicted probability value is less than 50%, the EGR flow rate is normal or the EGR flow rate is low.

In this embodiment, a prediction model with high flow and a prediction model with low flow are established by using sample data and sample results with high flow and sample results with low flow, and the prediction models with high flow and low flow are calculated respectively by using a group of collected data to obtain the prediction results of two prediction models, and the two prediction models are used for judging the classification results of multiple classes on the basis of two classes, so that the prediction results can be more accurate.

In this example, the above two-class prediction models have been used to distinguish "low EGR flow" from "non-low EGR flow", where the set of non-low EGR flow is { normal EGR flow, high EGR flow }. And the differentiation of 'high EGR flow' and 'non-high EGR flow' is realized, and the set of non-high EGR flow is { normal EGR flow and low EGR flow }. To obtain the final single prediction result, we will implement a multi-class solution by comparing the prediction probability values found by the two high-and low-flow diagnostic prediction models for the two classes.

And respectively inputting the same input data into a diagnosis and prediction model corresponding to low EGR flow and a diagnosis and prediction model corresponding to high EGR flow. And comparing the prediction probability value of the diagnosis prediction model with low EGR flow with the prediction probability value of the diagnosis prediction model with high EGR flow, and taking the higher probability as the predicted probability value, and determining the diagnosis result corresponding to the probability value as the final prediction result. When the probability value output by the two models is less than 50%, the flow is normal.

In addition, when the test vehicle is provided with an engine for reliability test and durability test, the vehicle remote monitoring platform can be used for data acquisition, and the test can be carried out for a long time. Because the test time is long, the engine is in the actual application environment, and the data has characteristics of a large amount, diversity and reality. A large amount of data related to EGR flow is collected by a test vehicle remote monitoring platform in real vehicle operation, and a multi-classification logistic regression algorithm is adopted to learn mass data, so that diagnosis of EGR flow abnormity is realized.

In conclusion, the cost function of the initial prediction model is established based on the initial prediction model by establishing the initial prediction model of the EGR characteristic weight and the EGR characteristic quantity; bringing multiple groups of sample data of multiple EGR characteristic quantities and corresponding sample results into a cost function, calculating an optimal solution of the cost function, and obtaining a diagnostic EGR characteristic weight value corresponding to the optimal solution; substituting the diagnostic EGR characteristic weight value into an initial prediction model to obtain a diagnostic EGR prediction model; and substituting the collected data of the EGR characteristic quantity into a diagnosis and prediction model to obtain a prediction result of the EGR flow. By utilizing the characteristics of large, real and various data quantity of the collected EGR characteristic quantity, a more accurate diagnosis and prediction model is established, and the defect that the traditional model development is limited by test conditions and cannot truly reflect the application environment is overcome. And carrying out characteristic quantity extraction and vectorization processing on the acquired big data. By analyzing the influence factors of the EGR flow, the relevant characteristic quantity is extracted. The data vectorization processing enables software to directly use a matrix for operation, the efficiency of the matrix operation is higher than that of a code entering a cycle after single-value processing, and system resources of a computing unit are saved. And designing a prediction model with low EGR flow and high EGR flow by adopting a logistic regression algorithm. And a prediction model with high flow and a prediction model with low flow are established through sample data and sample results with high flow and sample data and sample results with low flow, and multi-class classification result judgment is carried out on the basis of two classes, so that the prediction result is more accurate.

In the description of the present application, it should be noted that the terms "upper", "lower", and the like indicate orientations or positional relationships based on the orientations or positional relationships shown in the drawings, which are only for convenience in describing the present application and simplifying the description, and do not indicate or imply that the referred device or element must have a specific orientation, be constructed in a specific orientation, and operate, and thus, should not be construed as limiting the present application. Unless expressly stated or limited otherwise, the terms "mounted," "connected," and "connected" are intended to be inclusive and mean, for example, that they may be fixedly connected, detachably connected, or integrally connected; can be mechanically or electrically connected; they may be connected directly or indirectly through intervening media, or they may be interconnected between two elements. The specific meaning of the above terms in the present application can be understood by those of ordinary skill in the art as appropriate.

It is noted that, in the present application, relational terms such as "first" and "second", and the like, are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.

The above description is merely exemplary of the present application and is presented to enable those skilled in the art to understand and practice the present application. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the application. Thus, the present application is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims

1. An EGR flow diagnosis method based on a multi-classification logistic regression algorithm is characterized by comprising the following steps:

2. The EGR flow diagnostic method based on multi-classification logistic regression algorithm as claimed in claim 1, wherein: the establishing of the initial prediction model about the EGR characteristic weight and the EGR characteristic quantity specifically comprises the following steps:

3. The method for diagnosing EGR flow based on multiple classification logistic regression algorithm according to claim 2, wherein the establishing of the cost function with respect to the initial prediction model based on the initial prediction model specifically comprises:

4. The method for diagnosing EGR flow based on multiple classification logistic regression algorithm of claim 3, wherein the step of bringing multiple sets of sample data of multiple EGR characteristic quantities and corresponding sample results into a cost function, calculating an optimal solution of the cost function, and obtaining a diagnostic EGR characteristic weight value corresponding to the optimal solution specifically comprises:

5. The method for diagnosing EGR flow based on multiple classification logistic regression algorithm according to claim 4, wherein the solving of the optimal solution of the cost function by the gradient descent method specifically comprises:

according to the formula

6. The EGR flow diagnostic method based on multi-classification logistic regression algorithm as claimed in claim 4, wherein:

when the brought cost function is a plurality of groups of sample data and corresponding sample results of a plurality of EGR characteristic quantities with low EGR flow, a diagnosis and prediction model with low EGR flow is obtained;

7. The method for diagnosing the EGR flow based on the multi-classification logistic regression algorithm according to claim 6, wherein the step of bringing the collected data of the EGR characteristic quantity into the diagnosis and prediction model to obtain the prediction result of the EGR flow comprises the following steps:

8. The method for diagnosing the flow of EGR based on the multi-classification logistic regression algorithm of claim 7, wherein the same collected data of the characteristic quantity of the EGR is simultaneously brought into the diagnosis prediction model with low flow of the EGR and the diagnosis prediction model with high flow of the EGR, so as to obtain the prediction probability values of the diagnosis prediction models with low flow of the EGR and the diagnosis prediction model with high flow of the EGR, respectively, and the diagnosis result corresponding to the larger of the prediction probability values is the final prediction result.

9. The multiple classification logistic regression algorithm-based EGR flow diagnostic method according to claim 1, wherein the EGR characteristic quantities include at least three parameters of an EGR valve opening degree, an EGR valve inlet temperature, an EGR valve inlet pressure, an engine speed, an engine torque, an intake air flow rate, and an exhaust gas flow rate.

10. The multiple classification logistic regression algorithm-based EGR flow diagnostic method according to claim 9, wherein the EGR characteristic quantities are matrices of m × n, m is the number of groups of EGR characteristic quantities, and n is the number of EGR characteristic quantities.