CN114580086B - Vehicle component modeling method based on supervised machine learning - Google Patents

Vehicle component modeling method based on supervised machine learning Download PDF

Info

Publication number
CN114580086B
CN114580086B CN202210478749.1A CN202210478749A CN114580086B CN 114580086 B CN114580086 B CN 114580086B CN 202210478749 A CN202210478749 A CN 202210478749A CN 114580086 B CN114580086 B CN 114580086B
Authority
CN
China
Prior art keywords
machine learning
data
modeling
random forest
model
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202210478749.1A
Other languages
Chinese (zh)
Other versions
CN114580086A (en
Inventor
李文博
王伟
曲辅凡
王长青
颜燕
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
China Automotive Technology and Research Center Co Ltd
CATARC Automotive Test Center Tianjin Co Ltd
Original Assignee
China Automotive Technology and Research Center Co Ltd
CATARC Automotive Test Center Tianjin Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by China Automotive Technology and Research Center Co Ltd, CATARC Automotive Test Center Tianjin Co Ltd filed Critical China Automotive Technology and Research Center Co Ltd
Priority to CN202210478749.1A priority Critical patent/CN114580086B/en
Publication of CN114580086A publication Critical patent/CN114580086A/en
Application granted granted Critical
Publication of CN114580086B publication Critical patent/CN114580086B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F30/00Computer-aided design [CAD]
    • G06F30/10Geometric CAD
    • G06F30/15Vehicle, aircraft or watercraft design
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/243Classification techniques relating to the number of classes
    • G06F18/24323Tree-organised classifiers
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F30/00Computer-aided design [CAD]
    • G06F30/20Design optimisation, verification or simulation
    • G06F30/27Design optimisation, verification or simulation using machine learning, e.g. artificial intelligence, neural networks, support vector machines [SVM] or training a model
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • G06N20/10Machine learning using kernel methods, e.g. support vector machines [SVM]
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T90/00Enabling technologies or technologies with a potential or indirect contribution to GHG emissions mitigation

Abstract

The invention provides a vehicle component modeling method based on supervised machine learning, which comprises the following steps: collecting test data of the modeling component under different working conditions; preprocessing the test data; selecting a classification or regression learning algorithm according to the modeling component characteristics; extracting relevant features by using feature selection and feature transformation; constructing and training a model; and exporting and applying the trained model to the whole vehicle model. The vehicle component modeling method based on the supervised machine learning establishes a high-precision vehicle component model by training the selected supervised machine learning algorithm through test data, and improves the integral simulation precision of the whole vehicle.

Description

Vehicle component modeling method based on supervised machine learning
Technical Field
The invention belongs to the technical field of automobile simulation, and particularly relates to a vehicle component modeling method based on supervised machine learning.
Background
With the rapid development of new energy vehicles, the virtual simulation technology as an important development tool is widely applied, and simulation software is generally adopted in the development stage of the new energy vehicles, so that the research and development period can be greatly shortened, and the cost is reduced. However, the current simulation status of the new energy automobile has obvious defects. The current vehicle component test items serving as a modeling basis are gradually subdivided, the items are increased, but the working conditions are limited, all application scenes cannot be covered, the test conditions and the parameters are relatively isolated, and meanwhile, the evaluation data is real and reliable, but errors and fluctuation inevitably exist, so that the data are scattered and fuzzy to a certain degree. The quantity of the related variables of the vehicle component modeling is large, the logical relation among the variables is complex, the traditional modeling technology is built depending on engineering experience, the precision is difficult to improve, when a vehicle component model control strategy part is built, the influence degree of the parameters on the result is difficult to determine, and the related key threshold is difficult to directly obtain.
Therefore, original evaluation data are applied to a virtual simulation technology through machine learning, information such as a high-precision key performance curve, a curved surface or a key threshold value, a classification boundary and the like which is difficult to obtain in general tests is obtained, and the advantages of the traditional vehicle component modeling technology are combined to establish a vehicle component modeling method based on supervised machine learning, so that the model precision can be greatly improved, the multivariable high-precision high-efficiency vehicle component modeling problem is solved, and the development of the simulation technology is promoted.
Disclosure of Invention
In view of this, the present invention aims to provide a vehicle component modeling method based on supervised machine learning to solve the multivariable high-precision high-efficiency vehicle component modeling problem.
In order to achieve the purpose, the technical scheme of the invention is realized as follows:
a vehicle component modeling method based on supervised machine learning comprises the following steps:
s1, determining a target signal required to be output by a standby device learning modeling technology part in the vehicle component model;
s2, designing a test scheme according to the target signal required to be output in the step S1;
s3, testing the real vehicle parts according to the test scheme, and collecting test data of the modeling parts under different working conditions;
s4, preprocessing the test data collected in the step S3 to obtain a preprocessed target signal;
s5, dividing the target signal preprocessed in the step S4 to obtain a training set T and a test set D, and extracting n training subsets from the training set T;
s6, extracting the relevant features of the n training subsets by using feature selection and feature change, randomly selecting attributes from the relevant features of the n training subsets, and performing node splitting attributes to form a decision tree; repeating the steps of extracting relevant features, randomly selecting attributes and forming decision trees for n times to generate n decision trees, and combining the n decision trees to obtain a random forest machine learning model;
s7, obtaining a classification boundary, a key threshold, a key performance curve and a key performance curved surface by using the random forest machine learning model in the step S6;
s8, deriving a random forest machine learning model to a vehicle part model and applying the random forest machine learning model to the vehicle part model;
and S9, completing modeling through a traditional modeling technology according to the classification boundary, the key threshold, the key performance curve and the key performance curved surface obtained in the step S7 and the random forest machine learning model derived in the step S8.
Further, the vehicle component model in step S1 includes a part built using a machine learning modeling technique and a part built using a conventional modeling technique; the target signal that needs to be output refers to a signal that is difficult to process by conventional modeling techniques used in modeling vehicle components and needs to be processed by machine learning modeling techniques.
Further, the test scheme in step S2 is a scheme for ensuring that the maximum amount of information is obtained in each test, and includes setting up different initial states and operating states.
Further, the test data preprocessing in step S4 includes the following steps:
a1, loading and reading test data;
a2, taking data points deviating from the mean value by plus or minus three times of standard deviation in the test data as abnormal data points, and eliminating the abnormal data points by using a normal distribution diagram method to obtain missing test data;
a3, filling the missing test data in the step A2 to obtain complete test data;
a4, filtering signal noise of the complete test data in the step A3.
Further, in step S5, the target signal preprocessed in step S4 is divided into a training set T and a test set D, and the extracting n training subsets from the training set T includes the following steps:
b1, judging whether the data sample size of the preprocessed target signal is less than 20 ten thousand;
b2, if yes, the preprocessed target signal is calculated according to the ratio of 2: 8, dividing the test set D and the training set T according to the proportion, and entering the step B4;
b3, if not, the preprocessed target signal is processed in a mode that 2: 2: 6, dividing the test set D, the verification set and the training set T according to the proportion, and entering the step B4;
b4, extracting N samples from the training set T with the capacity of N by adopting an autonomous sampling method to be used as a training subset;
b5, repeating the step B1-the step B4 for n times to obtain n training subsets;
the feature selection in step S6 includes stepwise regression, sequential feature selection, regularization, neighbor analysis, and the feature variation includes principal component analysis, non-negative matrix factorization, factor analysis.
Further, the obtaining of the random forest machine learning model in step S6 includes the following steps:
c1, importing the correlation features obtained in the step S6, calculating information gains of the correlation features from the correlation features respectively by using an information gain formula, selecting the correlation feature with the largest information gain as a response variable, and using the response variable as the output of the random forest machine learning model obtained in the step C2;
c2, making the relevant characteristics in the step C1 into node classification attributes to form a decision tree; repeating the steps of extracting relevant features, selecting attributes and forming decision trees for n times to generate n decision trees, and combining the n decision trees to obtain a random forest machine learning model;
c3, configuring the following parameters of the random forest machine learning model under the training set T: the number of decision trees, the maximum depth of the trees, the minimum sample number of the segmented internal nodes and the maximum characteristic number used by each tree;
c4, training the model obtained by the configuration in the step C3;
c5, judging whether the model in the step C4 meets the requirement, if yes, executing the step S7, if no, returning to the step C1.
Further, the obtaining of the classification boundary, the key threshold, the key performance curve and the key performance surface by using the random forest machine learning model in the step S6 in the step S7 includes the following steps:
d1, determining the minimum step length to be 5% of the minimum time interval according to the amplitude and the span of the real value by using the test set D;
d2, dividing the real value of the input signal at equal intervals according to the minimum step length through linear interpolation to obtain an expanded input signal;
d3, transmitting the expanded input signals in the step D2 to a trained random forest machine learning model to obtain an output result calculated by the random forest machine learning model;
d4, analyzing the output result in the step D3 to obtain a classification boundary, a key threshold, a key performance curve and a key performance curved surface.
Further, the deriving and applying the random forest machine learning model to the vehicle component model in step S8 includes:
e1, configuring an input interface and an output interface of the random forest machine learning model;
e2, outputting the random forest machine learning model as a C code file;
e3, compiling the C code file into a vehicle component modeling platform interface file;
e4, linking the compiled interface file in the step E3 to the vehicle component model in the vehicle component modeling platform.
Compared with the prior art, the vehicle component modeling method based on supervised machine learning has the following advantages:
(1) the vehicle component modeling method based on the supervised machine learning is reasonable in design, original evaluation data are applied to a virtual simulation technology through the machine learning, information such as a high-precision key performance curve, a curved surface or a key threshold value, a classification boundary and the like which are difficult to obtain in general tests is obtained, and the multivariate high-precision high-efficiency vehicle component model building problem is solved by combining the advantages of the traditional vehicle component modeling technology.
(2) According to the vehicle component modeling method based on the supervised machine learning, the generalization characteristic of the machine learning technology is utilized to improve the modeling precision, meanwhile, the test testing times are greatly reduced, the cost is saved, and meanwhile, the problem that modeling basic data such as a key performance curve, a curved surface or a key threshold, a classification boundary and the like in the traditional modeling technology are difficult to obtain is solved.
(3) The vehicle component modeling method based on the supervised machine learning utilizes the advantages of generalization characteristic of the machine learning technology and solving multivariate complex relation, and combines the advantages of strong robustness and strong interpretability of the traditional modeling technology, thereby mutually promoting and improving the precision of the whole vehicle component model.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate an embodiment of the invention and, together with the description, serve to explain the invention and not to limit the invention. In the drawings:
FIG. 1 is a flow chart of the overall structure according to an embodiment of the present invention;
FIG. 2 is a schematic diagram illustrating test data preprocessing according to an embodiment of the present invention;
fig. 3 is a schematic diagram of cross-platform application of a machine learning model to a complete vehicle model according to an embodiment of the present invention.
Detailed Description
It should be noted that the embodiments and features of the embodiments may be combined with each other without conflict.
In the description of the present invention, it is to be understood that the terms "center", "longitudinal", "lateral", "up", "down", "front", "back", "left", "right", "vertical", "horizontal", "top", "bottom", "inner", "outer", and the like, indicate orientations or positional relationships based on those shown in the drawings, and are used only for convenience in describing the present invention and for simplicity in description, and do not indicate or imply that the referenced devices or elements must have a particular orientation, be constructed and operated in a particular orientation, and thus, are not to be construed as limiting the present invention. Furthermore, the terms "first", "second", etc. are used for descriptive purposes only and are not to be construed as indicating or implying relative importance or implicitly indicating the number of technical features indicated. Thus, a feature defined as "first," "second," etc. may explicitly or implicitly include one or more of that feature. In the description of the present invention, "a plurality" means two or more unless otherwise specified.
In the description of the present invention, it should be noted that, unless otherwise explicitly specified or limited, the terms "mounted," "connected," and "connected" are to be construed broadly, e.g., as meaning either a fixed connection, a removable connection, or an integral connection; can be mechanically or electrically connected; they may be connected directly or indirectly through intervening media, or they may be interconnected between two elements. The specific meaning of the above terms in the present invention can be understood by those of ordinary skill in the art through specific situations.
The present invention will be described in detail below with reference to the embodiments with reference to the attached drawings.
The noun explains:
normal distribution method:
a normal distribution is a probability distribution with two parameters
Figure 67303DEST_PATH_IMAGE001
And
Figure 328651DEST_PATH_IMAGE002
of the continuous type random variable, the first parameter
Figure 842809DEST_PATH_IMAGE001
Is the mean of the random variables following a normal distribution, the second parameter
Figure 831494DEST_PATH_IMAGE002
Is the variance of this random variable, so a normal distribution is recorded as
Figure 114707DEST_PATH_IMAGE003
. The probability law of random variables following normal distribution is
Figure 12869DEST_PATH_IMAGE001
The probability of a neighboring value is large, while the probability of taking a value farther away from μ is smaller;
Figure 748744DEST_PATH_IMAGE004
the smaller the distribution, the more concentrated the distribution
Figure 541119DEST_PATH_IMAGE001
In the vicinity of the location of the mobile station,
Figure 944419DEST_PATH_IMAGE004
the larger the distribution, the more dispersed. The normal distribution diagram method is a method for removing abnormal data by using normal distribution.
Information gain formula:
the information gain formula is a formula for calculating a difference value of entropies before and after dividing a data set by a certain characteristic, wherein the entropy can represent the uncertainty of a sample set, and the larger the entropy is, the larger the uncertainty of the sample is.
As shown in fig. 1 to 3, a vehicle component modeling method based on supervised machine learning includes the following steps:
s1, selecting multivariable complex logic relation which is difficult to realize by the traditional modeling technology or information such as high-precision key performance curve, curved surface or key threshold value, classification boundary and the like which is difficult to obtain by the traditional modeling technology as target signals which need to be output by a standby device learning modeling part in the vehicle component model;
s2, designing a test scheme according to the target signal to be output;
s3, testing the real vehicle parts under different working conditions according to the test scheme and collecting summarized test data; testing real vehicle components under the combined working condition of WLTC and NEDC according to the test scheme and collecting and summarizing test data; the modeling component refers to a component corresponding to a vehicle component model to be created;
in this embodiment, step S3 may perform 5 working condition tests on each of the real vehicle components at-30, -10, 0, 10, and 30 degrees celsius, respectively, to obtain 25 sets of test data;
s4, preprocessing the test data collected in the step S3 to obtain a preprocessed target signal;
s5, dividing the target signal processed in the step S4 to obtain a training set and a test set, and extracting n training subsets from the training set;
s6, extracting relevant features by using feature selection and feature change, randomly selecting attributes from the features, and performing node splitting attributes to form a complete decision tree; repeating the steps of extracting features, randomly selecting attributes and forming decision trees for n times to generate n decision trees, and combining the decision trees to obtain a random forest;
s7, obtaining a classification boundary, a key threshold, a key performance curve and a key performance curved surface by using the random forest machine learning model in the step S6;
s8, deriving a random forest machine learning model to a vehicle part model and applying the random forest machine learning model to the vehicle part model;
and S9, completing modeling through a traditional modeling technology according to the classification boundary, the key threshold, the key performance curve and the key performance curved surface obtained in the step S7 and the random forest machine learning model derived in the step S8. The vehicle component modeling method based on the supervised machine learning is reasonable in design, original evaluation data are applied to a virtual simulation technology through the machine learning, information such as a high-precision key performance curve, a curved surface or a key threshold value, a classification boundary and the like which are difficult to obtain in general tests are obtained, and the multivariable high-precision high-efficiency vehicle component model building problem is solved by combining the advantages of the traditional vehicle component modeling technology; the test times are greatly reduced by applying the machine learning technology, the cost is saved, and the problem that modeling basic data such as key performance curves, curved surfaces or key thresholds, classification boundaries and the like are difficult to obtain in the traditional modeling technology is solved; the method has the advantages of utilizing the generalization characteristic of the machine learning technology and solving the multivariable complex relation, and simultaneously combining the advantages of strong robustness and strong interpretability of the traditional modeling technology, and mutually promoting and improving the precision of the integral model of the vehicle component.
The vehicle component model in step S1 includes a part built by using a machine learning modeling technique and a part built by using a conventional modeling technique, and covers the vehicle actual mechanical component, the control strategy component; the target signals needing to be output refer to signals which are difficult to process by using a traditional modeling technology in vehicle component modeling and need to be processed by a machine learning modeling technology, the signals often need a large amount of engineering experience or unknown principles but have large influence on results, and the machine learning modeling technology refers to a technology for realizing modeling by a machine learning algorithm means; the conventional modeling technology refers to a technology for realizing modeling by means of general physics laws, empirical formulas and the like without using a machine learning algorithm.
The reasonable trial scenario in step S2 refers to a scenario that ensures that the amount of information obtained per trial is maximized. In this embodiment, the test scheme in step S2 ensures that the amount of information obtained in each test is the largest, including setting up different initial states and operating states, and avoiding data duplication caused by too many similar tests.
The test data preprocessing in step S4 includes the steps of:
a1, selecting a proper storage format to load and read test data; the purpose is that the test data for machine learning generally has a large volume, and there may be a plurality of data sources that are collected and sorted, resulting in different data storage formats, so it is necessary to select a suitable loading and reading manner for performing standardized and unified processing on the data by comprehensively considering aspects of data memory occupation, access speed, processing means, and the like.
A2, regarding the data points deviating from the mean value by plus or minus three times of standard deviation as abnormal data points, and eliminating the abnormal data points by using a normal distribution diagram method, specifically, in step S3, 5 times of NEDC working condition tests are respectively carried out at-30, -10, 0, 10 and 30 ℃, the data mean value and the standard deviation under each time of the NEDC working condition are respectively calculated, regarding the data points deviating from the mean value by plus or minus three times of standard deviation under each time as abnormal data points, and eliminating the abnormal data points; the method aims to solve the problems that on the basis of data driving characteristics, training effects of machine learning are directly related to the quality of test data, if the test data have abnormal values, training results are prone to be deviated, and particularly when the abnormal values far exceed normal values, influences are extremely obvious, so that abnormal data points need to be eliminated, and influences of unreasonable outliers are avoided.
In step a2, abnormal data point values based on known conditions, such as data points that exceed theoretical limit values (e.g., SOC values greater than 1, sample points well below sample time, voltage values that exceed maximum voltage, etc.), data points that violate realistic laws (e.g., negative distance traveled, negative mass, etc.).
A3, filling data missing, wherein the data missing occurs after the abnormal data points are eliminated in the step A2, and the general principle of processing the data missing is to fill the data by utilizing the information of the existing variables; if the data sample size is large, the average value of the data at the time under the two previous and next tests can be used for filling the missing data; if the data sample size is small, the average value of all tested data at the time under the NEDC working condition can be used for filling the missing data; the method aims to solve the problems that on one hand, the loss of test data set information is caused by data loss, on the other hand, machine learning cannot run normally or is low in efficiency due to errors or blanks caused by the data loss, and the efficiency of a machine learning algorithm is directly influenced, so that the missing data needs to be processed and filled, and the missing data entry can be directly deleted when the data volume is large enough.
A4, carrying out signal noise filtering operation on the data processed by the A3, and dividing the operation into the following three steps:
a41, performing data binning on data obtained by performing working condition tests at-30 degrees, at-10 degrees, at 0 degrees, at 10 degrees and at 30 degrees for five times respectively, and putting five groups of working condition data obtained at each temperature into one box to obtain five boxes A, B, C, D, E in total;
a42, carrying out mean value filtering on data at each time in the A box, respectively calculating a data mean value at each time, replacing the data at the time with the data mean value, filtering five data sets into a data set with the data at each time replaced by the mean value, respectively carrying out mean value filtering on the data in the five boxes, wherein the five boxes obtain five data sets in total, and the mean value filtering can remove irrelevant details in an image;
a43, performing median filtering on five data sets obtained by the five boxes, calculating data median values at each time in the five data sets, obtaining a new data set by using the data median values at each time, and using the median filtering at the stage to remove isolated noise, improve data smoothness and protect data edges;
the method aims to solve the problems that test data are derived from data stored in real time by specific test equipment in a test working condition, errors and noises caused by a test system cannot be avoided, random irregular fluctuation exists in the test data, the characteristics of the data are possibly covered, and training speed and accuracy are reduced, so that noise filtration needs to be carried out on the test data.
Further, in step S5, the dividing the target signal processed in step S4 into a training set T and a test set D, and the extracting n training subsets from the training set includes the following steps:
b1, when the data sample size is small (less than 20 ten thousand), the ratio of 2: 8, dividing a test set and a training set according to the proportion; when the data sample size is large (more than 20 ten thousand), the ratio of 2: 2: 6, dividing a test set, a verification set and a training set according to the proportion; from the training set T with the capacity of N, an autonomous sampling method is adopted, namely N samples are extracted in a replacement way to be used as a training subset
Figure 281990DEST_PATH_IMAGE005
B2, repeating the step B1 n times to obtain n training subsets
Figure 770740DEST_PATH_IMAGE006
The feature selection in step S6 includes stepwise regression, sequential feature selection, regularization, neighbor analysis, and the feature variation includes principal component analysis, non-negative matrix factorization, factor analysis.
The obtaining of the random forest machine learning model comprises the following steps:
c1, importing the correlation features obtained in step S6, calculating information gains of the correlation features from the correlation features by using an information gain formula, selecting the correlation feature with the largest information gain as a response variable, and using the response variable as an output of the random forest machine learning model obtained in step C2;
the important new of the characteristic variable can be obtained by an information gain method, the larger the information gain is, the more important the characteristic is, and the information gain is calculated by the following formula:
suppose there are k types of features:
Figure 976594DEST_PATH_IMAGE007
the probability of each feature occurrence is:
Figure 359034DEST_PATH_IMAGE008
the information entropy for each feature is calculated as follows:
Figure 523299DEST_PATH_IMAGE009
the probability of each data occurring in all features is:
Figure 499345DEST_PATH_IMAGE010
the probability of each data not occurring is:
Figure 118676DEST_PATH_IMAGE011
the formula for calculating conditional entropy is as follows:
Figure 230989DEST_PATH_IMAGE012
wherein the content of the first and second substances,
Figure 628472DEST_PATH_IMAGE013
is the information entropy of the data and,
Figure 701601DEST_PATH_IMAGE014
is the entropy of the information in which the data does not appear;
the overall information gain formula is:
Figure 249257DEST_PATH_IMAGE015
the larger the information gain is, the larger the entropy change is, the more the classification is facilitated, and the correlation characteristic with the largest information gain is selected as a response variable;
c2, making node classification attribute according to the relevant characteristics in the step C1 to form a complete decision tree; repeating the steps of extracting features, selecting attributes and forming decision trees for n times to generate n decision trees, and combining the decision trees to obtain a random forest machine learning model;
c3, configuring the following parameters in the random forest machine learning model under the training set: the number of decision trees, the maximum depth of the trees, the minimum sample number of the segmented internal nodes and the maximum characteristic number used by each tree;
the number of decision trees is: if the number of the decision trees is too large, the calculated amount is too large; if the number of the decision trees is too small, fitting is easy to be underfitted, and the number is generally selected to be 100;
maximum depth of tree: setting the maximum depth parameter max _ depth of the number as None, and fitting the node to the condition that the information gain is 0, wherein the preparation degree is higher;
minimum number of samples to segment internal nodes: setting the minimum sample number min _ sample _ leaf set by the segmentation internal node as 1, wherein the significance is that when the leaf node sample number is less than the minimum sample number min _ sample _ leaf of the node, the leaf node is pruned, and only the father node of the leaf node is left;
maximum number of features per tree: set to None. There is no limit to the maximum number of features;
the parameter configuration of the random forest machine learning model directly influences the training result, and meanwhile, the parameters are generally more, so that the optimal parameter configuration combination can be found by selecting common optimization technologies such as Bayesian optimization, grid search and a gradient-based optimization method, and the parameter range can be quickly narrowed by selecting a proper test method such as an orthogonal test method;
c4, training the model obtained by the step C3;
c5, judging whether the model meets the requirements, if yes, executing the step S7, and if not, returning to the step C1; judging whether the model meets the evaluation function of the output index of the key model, so that the accuracy of the model can be judged by comprehensively considering error evaluation indexes such as ROC (rock characteristic) curves, confusion matrixes, MSE (mean square error) curves and the like;
in the present embodiment, the feature selection in step S7 includes, but is not limited to, stepwise regression, sequential feature selection, regularization, neighbor analysis (NCA), and the like; characteristic variations include, but are not limited to, principal component analysis, non-negative matrix factorization, and the like.
The step of obtaining the classification boundary, the key threshold, the key performance curve and the key performance curved surface by the random forest machine learning model in the step S8 includes the following steps:
d1, determining the minimum step length to be 5% of the minimum time interval according to the information such as the amplitude, the span and the like of the true value by using the test set D; the actual value of the input signal obtained in the test is necessarily a test carried out according to a certain interval value no matter in cost control or in protection of vehicle components, so that the minimum step length is determined according to the amplitude and the span, and can be 5% of the minimum interval;
d2, dividing the real value of the input signal at equal intervals according to the minimum step length by linear interpolation; more and more data are needed by high-precision modeling, so that the minimum step length obtained in the step D1 is utilized, and linear interpolation is used for dividing each interval of the true value of the input signal at equal intervals according to the minimum step length;
d3, transmitting the input signal obtained after the expansion in the step D2 to a random forest machine learning model to obtain an output result obtained by model calculation;
d4, analyzing the output result in the step D3 to obtain a classification boundary, a key threshold, a key performance curve and a key performance curved surface.
Deriving and applying the random forest machine learning model to the vehicle component model in step S8 includes the steps of:
e1, configuring an input interface and an output interface of the random forest machine learning model;
e2, outputting the random forest machine learning model as a C code file;
e3, compiling the C code file into a vehicle component modeling platform interface file; such as MEX files in MATLAB/SIMULINK, dll files in CRUISE;
e4, linking the compiled interface file in the step E3 to the vehicle component model in the vehicle component modeling platform, and completing modeling.
The modeling completed by the conventional modeling technique in step S9 refers to a part that requires a lot of engineering experience and parameters for the conventional modeling technique and is difficult to obtain, and by using the machine learning modeling technique, step S7 provides a classification boundary, a key threshold, a key performance curve and a key performance curve that are difficult to obtain by the conventional modeling technique, and the part can be modeled by using the conventional modeling technique through physical formula laws, logical algorithms, standard specifications, and the like; for the part with unknown principle but important influence on the result, the random forest machine learning model obtained in the step S8 is directly used for modeling the part, and the modeling of the whole vehicle part is completed by using the traditional modeling technology except for the part which requires a large amount of engineering experience and parameters and is difficult to obtain and has an unknown principle but has important influence on the result.
Example 1
The invention is explained in an embodiment using a supervised machine learning algorithm to model a battery model in a MATLAB environment, but it should be noted that the method of the invention is not limited to the MATLAB platform only, and the steps are as follows:
f1, determining a target signal. The method determines the final output signal of a machine learning modeling part in a battery model as an SOC value; and finally outputting a signal to be a battery output current by using the traditional modeling part.
F2, determining the test scheme. The test conditions of this example are-30, -10, 0, 10, 30 degrees centigrade, each of which is subjected to 5 NEDC condition tests.
And F3, testing and collecting battery test data. And collecting test data of the output voltage, the output current, the battery temperature, the open-circuit voltage and the SOC value of the battery model to be built under the working condition of a key signal NEDC. The collection mode can adopt a bench test or a real vehicle sensor test, but the data is ensured to have enough volume, the embodiment relates to that the signal test data exceeds 15 ten thousand and is less than 20 ten thousand, the covered working conditions are as comprehensive as possible, the working state of the battery in the full life cycle is included as far as possible, and the trained model is ensured to have universality.
F4, preprocessing the test data. And preprocessing the battery test data to obtain a preprocessed SOC value.
F5, processing the target signal preprocessed in the step H4 in a ratio of 2: 8, obtaining a training set T and a test set D by proportional division, and extracting 10 training subsets from the training set T;
f6, extracting relevant features and constructing a training battery random forest machine learning model. And importing all relevant features by taking the SOC value as a response, training a model by taking a confusion matrix and an MSE value as indexes without using cross validation, and circularly modifying the number of decision trees of the random forest, the maximum depth of the numbers, the minimum sample number of the segmented internal nodes and the maximum feature number used by each tree according to the indexes until the model meets the precision requirement.
F7, obtaining a key performance curve by using the battery random forest machine learning model. And acquiring a performance curve of the battery output voltage and the battery temperature by using the model trained in the F6.
And F8, deriving a battery random forest machine learning model. Exporting trained battery model m script codes to MATLAB, compiling the m script by using a C compiler to generate an MATLAB MEX file, and packaging the generated MEX file into sub-modules by using an S function to be imported into the battery SIMULINK model.
F9, completing battery model building by using the key performance curve and using the traditional modeling technology. And (3) according to a performance curve of the battery output voltage and the battery temperature obtained by H7 and a physics law of the battery, the product of the output current and the output voltage is equal to the output power of the battery, and the battery model building is completed.
Preprocessing the data in step F4 includes the following detailed steps:
f41, because the battery test data volume is large, and the data formats and limit values of all key parameters are not uniform, the storage formats of common numerical values, arrays, character strings and the like occupy a large memory and are complicated to operate, as shown in the figure, the test data time is in a character string format, the battery control signal is in a Boolean quantity format, and the effective digits of the rest data are different and not uniform, so that a table data storage format is selected to introduce the data into MATLAB.
And F42, marking outliers by using a normal distribution diagram method and removing outlier abnormal data. However, when outliers are eliminated, certain verification is carried out on the eliminated data, and it is necessary to determine whether the data are few meaningful working condition test data or not.
F43, processing the obvious abnormal value of each signal test data independently. As shown in table 1, data of more than 400 need to be deleted for output voltage and VOC; the output current needs to delete data larger than 1.5; temperature requires deletion of data greater than 50; the SOC signal needs to delete data with SOC greater than 1 or less than 0, for example, the SOC exceeds 0 to 1 at the time of 9:03:08 to 9:03:10 and needs to be deleted.
TABLE 1
Time Output voltage Output current Temperature of VOC SOC
9:03:05 378.03 1.11 20.96 378.11 0.58159832
9:03:06 378.03 1.11 20.96 378.11 0.581597853
9:03:07 378.03 1.11 20.96 378.11 0.581597386
9:03:08 378.03 1.11 20.96 378.11 1.581596918
9:03:09 378.03 1.11 20.96 378.11 1.581596451
9:03:10 378.03 1.11 20.96 378.11 22.27625333
9:03:11 378.03 1.11 20.96 378.11 0.581595516
9:03:12 378.03 1.11 20.96 378.11 0.581595049
9:03:13 378.03 1.11 20.96 378.11 0.581594582
9:03:14 378.03 1.11 20.96 378.11 0.581594114
9:03:15 378.03 1.11 20.96 378.11 0.581593647
9:03:16 378.03 1.11 20.96 378.11 0.581593179
9:03:17 378.03 1.11 20.96 378.11 0.581592712
9:03:18 378.03 1.11 20.96 378.11 0.581592245
9:03:19 378.03 1.11 20.96 378.11 0.581591777
9:03:20 378.03 1.11 20.96 378.11 0.58159131
F44, processing the missing data. Counting the number of the items missing from the test data, if the ratio is not large (i.e. the number of the items missing from the test data is less than 5% of the total number of the items missing from the test data), directly deleting the missing items, otherwise filling the missing items with the average value of the neighboring points, as shown in table 2, recursively acquiring the front value and the rear value of the missing data from the time point 4:03:37 to 4:03:39, and filling the data from the time point 4:03:37 to 4:03:39 with the average value of the neighboring points.
TABLE 2
Time Output voltage Output current Temperature of VOC SOC
4:03:20 382.17 0.00 20.00 382.18 0.7
4:03:21 382.17 0.00 20.00 382.18 0.7
4:03:22 382.17 0.00 20.00 382.18 0.7
4:03:23 382.17 1.10 20.00 382.18 0.699999584
4:03:24 382.09 1.10 20.00 382.18 0.699999122
4:03:25 382.09 1.10 20.00 382.18 0.699998659
4:03:26 382.09 1.10 20.00 382.18 0.699998197
4:03:29 382.09 1.10 20.00 382.18 0.69999681
4:03:30 382.09 1.10 20.00 382.18 0.699996347
4:03:31 382.09 1.10 20.00 382.18 0.699995885
4:03:32 382.09 1.10 20.00 382.18 0.699995423
4:03:33 382.09 1.10 20.00 382.18 0.69999496
4:03:34 382.09 1.10 20.00 382.18 0.699994498
4:03:35 382.09 1.10 20.00 382.18 0.699994035
4:03:36 382.09 1.10 20.00 382.18 0.699993573
4:03:37 1.10 20.00 382.18 0.699993111
4:03:38 382.09 1.10 20.00 382.18
4:03:39 382.09 1.10 20.00 382.18
4:03:40 382.09 1.10 20.00 382.18 0.699991724
4:03:41 382.09 1.10 20.00 382.18 0.699991261
4:03:42 382.09 1.10 20.00 382.18 0.699990799
F45, removing isolated noise by using mean value filtering.
The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents, improvements and the like that fall within the spirit and principle of the present invention are intended to be included therein.

Claims (3)

1. A vehicle component modeling method based on supervised machine learning is characterized in that: the method comprises the following steps:
s1, determining a target signal required to be output by a standby device learning modeling technology part in the vehicle component model;
the vehicle component model in step S1 includes a part built using a machine learning modeling technique and a part built using a conventional modeling technique; the target signal to be output refers to a signal which is difficult to process by a traditional modeling technology used in vehicle component modeling and needs to be processed by a machine learning modeling technology;
s2, designing a test scheme according to the target signal required to be output in the step S1;
s3, testing the real vehicle parts according to the test scheme, and collecting test data of the modeling parts under different working conditions;
s4, preprocessing the test data collected in the step S3 to obtain a preprocessed target signal;
the test data preprocessing in step S4 includes the steps of:
a1, loading and reading test data;
a2, taking data points deviating from the mean value by plus or minus three times of standard deviation in the test data as abnormal data points, and eliminating the abnormal data points by using a normal distribution diagram method to obtain missing test data;
a3, filling the missing test data in the step A2 to obtain complete test data;
a4, filtering signal noise of the complete test data in the step A3;
s5, dividing the target signal preprocessed in the step S4 to obtain a training set T and a test set D, and extracting n training subsets from the training set T;
in step S5, the target signal preprocessed in step S4 is divided into a training set T and a test set D, and the extraction of n training subsets from the training set T includes the following steps:
b1, judging whether the data sample size of the preprocessed target signal is less than 20 ten thousand;
b2, if yes, the preprocessed target signal is calculated according to the ratio of 2: 8, dividing the test set D and the training set T according to the proportion, and entering the step B4;
b3, if not, the preprocessed target signal is processed in a mode that 2: 2: 6, dividing the test set D, the verification set and the training set T according to the proportion, and entering the step B4;
b4, extracting N samples from a training set T with the capacity of N by adopting an autonomous sampling method to be used as a training subset;
b5, repeating the step B1-the step B4 for n times to obtain n training subsets;
s6, extracting the relevant features of the n training subsets by using feature selection and feature change, randomly selecting attributes from the relevant features of the n training subsets, and performing node splitting attributes to form a decision tree; repeating the steps of extracting relevant features, randomly selecting attributes and forming decision trees for n times to generate n decision trees, and combining the n decision trees to obtain a random forest machine learning model;
the obtaining of the random forest machine learning model in step S6 includes the steps of:
c1, importing the correlation features obtained in the step S6, calculating information gains of the correlation features from the correlation features respectively by using an information gain formula, selecting the correlation feature with the largest information gain as a response variable, and using the response variable as the output of the random forest machine learning model obtained in the step C2;
the information gain is calculated by the following formula:
suppose there are k types of features:
C 1 ,C 2 ,C 3 ...C k
the probability of each feature occurrence is:
P(C 1 ),P(C 2 ),P(C 3 ),...P(C k )
the information entropy for each feature is calculated as follows:
Figure FDA0003730685090000031
the probability of each data occurring in all features is:
P(data)
the probability of each data not occurring is:
Figure FDA0003730685090000032
the formula for calculating conditional entropy is as follows:
Figure FDA0003730685090000033
wherein H (C | data) is the information entropy of the data,
Figure FDA0003730685090000034
is the entropy of the information in which the data does not appear;
the overall information gain formula is:
IG(T)=H(C)-H(C|T);
c2, making the relevant characteristics in the step C1 into node classification attributes to form a decision tree; repeating the steps of extracting relevant features, selecting attributes and forming a decision tree for n times to generate n decision trees, and combining the n decision trees to obtain a random forest machine learning model;
c3, configuring the following parameters of the random forest machine learning model under the training set T: the number of decision trees, the maximum depth of the trees, the minimum sample number of the segmented internal nodes and the maximum characteristic number used by each tree;
the number of decision trees is: the number of decision trees is 100;
maximum depth of tree: setting a maximum depth parameter max _ depth of the number as None, and fitting the node until the information gain is 0;
minimum number of samples to segment internal nodes: setting the minimum sample number min _ sample _ leaf set by the segmentation internal node as 1, wherein the significance is that when the leaf node sample number is less than the minimum sample number min _ sample _ leaf of the node, the leaf node is pruned, and only the father node of the leaf node is left;
maximum number of features per tree: set to None;
c4, training the model obtained by the configuration in the step C3;
c5, judging whether the model in the step C4 meets the requirement, if yes, executing the step S7, and if not, returning to the step C1;
s7, obtaining a classification boundary, a key threshold, a key performance curve and a key performance curved surface by using the random forest machine learning model in the step S6;
obtaining classification boundaries, key thresholds, key performance curves and key performance surfaces using the random forest machine learning model in step S6 in step S7 includes the steps of:
d1, determining the minimum step length to be 5% of the minimum time interval according to the amplitude and the span of the real value by using the test set D;
d2, dividing the real value of the input signal at equal intervals according to the minimum step length through linear interpolation to obtain an expanded input signal;
d3, transmitting the expanded input signals in the step D2 to a trained random forest machine learning model to obtain an output result calculated by the random forest machine learning model;
d4, analyzing the output result in the step D3 to obtain a classification boundary, a key threshold, a key performance curve and a key performance curved surface;
s8, deriving a random forest machine learning model to a vehicle part model and applying the random forest machine learning model to the vehicle part model;
the deriving and applying a random forest machine learning model to a vehicle component model in step S8 includes the steps of:
e1, configuring an input interface and an output interface of the random forest machine learning model;
e2, outputting the random forest machine learning model as a C code file;
e3, compiling the C code file into a vehicle component modeling platform interface file;
e4, linking the compiled interface file in the step E3 to the vehicle component model in the vehicle component modeling platform;
and S9, completing modeling through a traditional modeling technology according to the classification boundary, the key threshold, the key performance curve and the key performance curved surface obtained in the step S7 and the random forest machine learning model derived in the step S8.
2. The supervised machine learning-based vehicle component modeling method of claim 1, wherein: the test scenario in step S2 refers to a scenario for ensuring the maximum amount of information obtained in each test, and includes setting up different initial states and operating states.
3. The supervised machine learning-based vehicle component modeling method of claim 1, wherein: the feature selection in step S6 includes stepwise regression, sequential feature selection, regularization, neighbor analysis, and the feature variation includes principal component analysis, non-negative matrix factorization, factor analysis.
CN202210478749.1A 2022-05-05 2022-05-05 Vehicle component modeling method based on supervised machine learning Active CN114580086B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210478749.1A CN114580086B (en) 2022-05-05 2022-05-05 Vehicle component modeling method based on supervised machine learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210478749.1A CN114580086B (en) 2022-05-05 2022-05-05 Vehicle component modeling method based on supervised machine learning

Publications (2)

Publication Number Publication Date
CN114580086A CN114580086A (en) 2022-06-03
CN114580086B true CN114580086B (en) 2022-08-09

Family

ID=81778782

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210478749.1A Active CN114580086B (en) 2022-05-05 2022-05-05 Vehicle component modeling method based on supervised machine learning

Country Status (1)

Country Link
CN (1) CN114580086B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116308183A (en) * 2023-03-23 2023-06-23 黄河勘测规划设计研究院有限公司 Intelligent design method for key indexes of hydraulic and hydroelectric engineering artificial sand stone processing system

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110266528A (en) * 2019-06-12 2019-09-20 南京理工大学 The method for predicting of car networking communication based on machine learning

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20210174257A1 (en) * 2019-12-04 2021-06-10 Cerebri AI Inc. Federated machine-Learning platform leveraging engineered features based on statistical tests
CN113125960A (en) * 2019-12-31 2021-07-16 河北工业大学 Vehicle-mounted lithium ion battery charge state prediction method based on random forest model
CN113516173B (en) * 2021-05-27 2022-09-09 江西五十铃汽车有限公司 Evaluation method for static and dynamic interference of whole vehicle based on random forest and decision tree

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110266528A (en) * 2019-06-12 2019-09-20 南京理工大学 The method for predicting of car networking communication based on machine learning

Also Published As

Publication number Publication date
CN114580086A (en) 2022-06-03

Similar Documents

Publication Publication Date Title
CN102831440B (en) Method and device for decision tree based wide-area remote sensing image classification
CN112699605B (en) Charging pile fault element prediction method and system
US8073652B2 (en) Method and system for pre-processing data using the mahalanobis distance (MD)
CN114580086B (en) Vehicle component modeling method based on supervised machine learning
CN102682089A (en) Method for data dimensionality reduction by identifying random neighbourhood embedding analyses
CN112687349A (en) Construction method of model for reducing octane number loss
CN103838820A (en) Evolutionary multi-objective optimization community detection method based on affinity propagation
CN114840900A (en) Derivative BIM component automatic generation method based on i-GBDT technology
Le Rhun et al. A stochastic data-based traffic model applied to vehicles energy consumption estimation
CN104090995B (en) The automatic generation method of rebar unit grids in a kind of ABAQUS tire models
CN112286977A (en) Data pushing method, electronic equipment and system based on cloud computing
CN114781520A (en) Natural gas behavior abnormity detection method and system based on improved LOF model
CN113516173B (en) Evaluation method for static and dynamic interference of whole vehicle based on random forest and decision tree
Sharma et al. A semi-supervised generalized vae framework for abnormality detection using one-class classification
CN117236278B (en) Chip production simulation method and system based on digital twin technology
CN114386466A (en) Parallel hybrid clustering method for candidate signal mining in pulsar search
CN110472659A (en) Data processing method, device, computer readable storage medium and computer equipment
CN102141988B (en) Method, system and device for clustering data in data mining system
CN113705110A (en) Blasting vibration speed prediction method based on dual random forest regression method
JP5765583B2 (en) Multi-class classifier, multi-class classifying method, and program
JP5892275B2 (en) Multi-class classifier generation device, data identification device, multi-class classifier generation method, data identification method, and program
CN105373583A (en) Modeling method for support vector machine based on data compression
CN103246793B (en) A kind of method of drawing tire ground connection cloud atlas based on ABAQUS analysis result
CN116010831A (en) Combined clustering scene reduction method and system based on potential decision result
CN113111588A (en) NO of gas turbineXEmission concentration prediction method and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant