CN114971675A - Second-hand car price evaluation method based on deep FM model - Google Patents
Second-hand car price evaluation method based on deep FM model Download PDFInfo
- Publication number
- CN114971675A CN114971675A CN202210357696.8A CN202210357696A CN114971675A CN 114971675 A CN114971675 A CN 114971675A CN 202210357696 A CN202210357696 A CN 202210357696A CN 114971675 A CN114971675 A CN 114971675A
- Authority
- CN
- China
- Prior art keywords
- data
- model
- vehicle
- layer
- matrix
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q30/00—Commerce
- G06Q30/02—Marketing; Price estimation or determination; Fundraising
- G06Q30/0278—Product appraisal
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/10—Pre-processing; Data cleansing
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/213—Feature extraction, e.g. by transforming the feature space; Summarisation; Mappings, e.g. subspace methods
- G06F18/2135—Feature extraction, e.g. by transforming the feature space; Summarisation; Mappings, e.g. subspace methods based on approximation criteria, e.g. principal component analysis
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/044—Recurrent networks, e.g. Hopfield networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/048—Activation functions
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- Artificial Intelligence (AREA)
- Life Sciences & Earth Sciences (AREA)
- General Engineering & Computer Science (AREA)
- Evolutionary Computation (AREA)
- Computational Linguistics (AREA)
- Mathematical Physics (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Business, Economics & Management (AREA)
- Health & Medical Sciences (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Software Systems (AREA)
- General Health & Medical Sciences (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- Finance (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Accounting & Taxation (AREA)
- Development Economics (AREA)
- Evolutionary Biology (AREA)
- Strategic Management (AREA)
- Entrepreneurship & Innovation (AREA)
- Game Theory and Decision Science (AREA)
- Economics (AREA)
- Marketing (AREA)
- General Business, Economics & Management (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
Abstract
The invention discloses a second-hand car price evaluation method based on a deep FM model, which comprises the following steps: taking the second-hand car transaction data as input data; performing feature segmentation on attribute features in second-hand vehicle data; respectively preprocessing the three characteristics of the second-hand vehicle; arranging the attribute characteristics of the same second-hand vehicle after preprocessing into a row to form a row vector; arranging and splicing the data of all the second-hand vehicles according to rows to form a second-hand vehicle data matrix; performing data dimension reduction on numerical characteristics in the second-hand vehicle data matrix to obtain a data matrix; splicing the second-hand car data price as a label to the tail of the corresponding second-hand car price row; constructing a deep FM network; inputting the obtained second-hand vehicle data matrix into a deep FM model for training to obtain parameters of the model; and inputting the obtained second-hand vehicle data matrix into a deep FM model for price estimation. The invention has the advantages that: the accuracy of the price evaluation of the second-hand vehicle is improved, the workload is reduced, the feature dimension is reduced, and the memory and the operation time are saved.
Description
Technical Field
The invention relates to the technical field of price evaluation of second-hand vehicles, in particular to a price evaluation method of second-hand vehicles based on a deep FM model.
Background
Along with the improvement of the popularization rate of automobiles, the trading volume of second-hand cars is continuously improved, and the method has a wide development prospect. The evaluation of the value of the used cars is particularly important in the market of gradually-increased used cars trading. The traditional price evaluation method has the defects of dependence on market and experience of an evaluator, influence of subjective factors on evaluation results, high evaluation cost, low evaluation efficiency and the like. The conventional method is mainly adopted in the conventional second-hand car market, and the evaluation result is very dependent on personal subjective feeling. Therefore, an accurate and scientific used vehicle price prediction method is provided, the accuracy of used vehicle value prediction is improved, and the method has important significance for development of used vehicle industry. In recent years, machine learning methods have been used in attempts to evaluate used car prices as a reference price for used car transactions.
Deep learning is a novel machine learning method, which forms more abstract high-level representation attribute classes or features by combining low-level features to discover distributed feature representation of data. Deep learning structures such as Deep Neural Networks (DNNs), Convolutional Neural Networks (CNNs), and Recurrent Neural Networks (RNNs) have been successfully applied in the fields of computer vision, speech recognition, natural language processing, and the like. Compared with a shallow neural network, the deep neural network has the advantages that more layers provide higher abstract layers for the model, and the prediction capability of the model is improved. Aiming at complex vehicle types and regional conditions, the evaluation price of the second-hand vehicle is obtained by using a deep learning method, and the problems of dependence on market experience, dependence on subjective feeling, low evaluation efficiency and the like in price evaluation can be solved.
Disclosure of Invention
Aiming at the defects of the prior art, the invention provides a used vehicle price evaluation method based on a deep FM model.
In order to realize the purpose, the technical scheme adopted by the invention is as follows:
a used vehicle price evaluation method based on a deep FM model comprises the following steps:
1) taking historical second-hand car transaction data as input data; the historical used vehicle transaction data comprises used vehicle attribute characteristic data x origin And a transaction price y, wherein the attribute characteristics comprise vehicle registration date, vehicle transaction date, vehicle type, vehicle brand, vehicle type, fuel type, transmission type, engine power, vehicle mileage, and vehicle region.
2) The attribute characteristics in the second-hand vehicle data are subjected to characteristic segmentation, and the attribute characteristics are segmented into three types of characteristics: a numerical feature, a high cardinality category feature, and a low cardinality category feature;
3) respectively preprocessing the three characteristics of the second-hand vehicle; the pretreatment comprises the following steps: data cleaning, missing value filling, feature coding and data standardization;
4) arranging the attribute characteristics of the same second-hand vehicle after preprocessing into a row to form a row vector x;
5) data x of all used cars are compared i Arranging and splicing the handcart data matrix according to rows to form a second-hand vehicle data matrix;
6) performing data dimension reduction on the numerical characteristics in the second-hand vehicle data matrix X to obtain a data matrix X';
7) splicing the second-hand car data price as a label to the tail of the corresponding second-hand car vector line;
8) constructing a deep FM model for evaluating the price of the second-hand vehicle;
9) carrying out steps 1) -7) on second-hand car data used for model training, inputting the obtained second-hand car data matrix into a deep FM model for training, and obtaining parameters of the model and a network model used for estimating price of the second-hand car;
10) and (3) carrying out steps 1) to 6) on the used vehicle data to be estimated, and inputting the obtained used vehicle data matrix into a deep FM model for estimating the price.
Further, in the step 2, the characteristic segmentation divides the attribute characteristics of the original used vehicle into numerical characteristics and category characteristics, and the category characteristics are segmented according to the cardinality, wherein the cardinality larger than 10 is high cardinality category characteristics, and the cardinality smaller than or equal to 10 is low cardinality category characteristics.
Further, in step 3
And (4) data cleaning is carried out by using a box line graph to remove abnormal values, and the maximum and minimum values in the data are removed.
Missing value filling means that the missing value of the category feature is filled by using the mode of the feature of all data, and the numerical feature is filled by using the mean value of the feature.
Feature encoding refers to mean encoding and one-hot encoding. Carrying out mean value coding on the high-cardinality classification characteristics, wherein the specific formula is as follows:
wherein g (y, x) i ) For the coded eigenvalues, y is the price of the second-hand vehicle, λ (n) i )∈[0,1]Default value of 0.5, n for reliability of two means i Is a characteristic value of x i N is the total number of samples,is x ═ x i The corresponding y-means value is then calculated,is the y mean value over the entire training set;
carrying out one-hot coding on the low-cardinality class characteristics by the following process:
the second-hand vehicle data is provided with a class characteristic x with a base number of m, an n-m sparse matrix A is constructed through dummy coding, each column of the matrix corresponds to a value of the characteristic x, the numerical value of each column indicates whether the characteristic belongs to the current characteristic, and the original characteristic x is replaced by the coded sparse numerical value matrix A.
The data normalization adopts a normalization method, and the formula is as follows:
wherein x i Before normalization, x' is the value after normalization, and n is the number of samples.
Further, in the step 6, a principal component analysis method is adopted for data dimensionality reduction, and data with 99% of principal components is selected, and the specific steps are as follows:
first, a covariance matrix of original data X is obtained
Wherein X is an original data matrix, n is the number of columns of the matrix X, and m is the number of rows of the matrix X;
calculating a covariance matrix C m×m Characteristic value (λ) of i ) i=0,…,m And a feature vector (p) i ) i=0,…,m ;
The eigenvalue lambda is arranged from large to small as { lambda 0 ,λ 1 ,…,λ m In which λ is 0 ≥λ 1 ≥…≥λ m Taking the first k characteristic values, wherein the sum of the characteristic values accounts for 99 percent of the sum of all the characteristic values, and corresponding characteristic vectors { p% 0 ,p 1 ,…,p k Are combined into a transform matrix
P k×m =[p 0 ,p 1 ,…,p k ] T
Transforming the matrix P k×m And original second-hand car data X m×n Multiplying to obtain the data after dimensionality reduction
Y k×n =P k×m X m×n
Wherein, Y k×n To reduce the dimension of the post-matrix, P k×m For transforming the matrix, X m×n Is a matrix of raw data.
Further, the deep FM model constructed in step 8 includes an input layer, an embedded layer, an FM layer, a DNN layer, and an output layer in order from input to output. The deep FM model input is composed of a plurality of input fields, and is divided into a category characteristic field and a numerical characteristic field. The category feature domain corresponds to the low-radix category feature preprocessed in the step 3, and the numerical feature domain corresponds to the high-radix category feature and the numerical feature.
Each input field is connected with an embedding unit of the embedding layer, and is converted into an embedding vector with a dimension k after passing through the embedding layer, wherein the default value of k is 8.
The model FM layer is a factorization machine, the first order unit of which is connected with each input domain, the second order unit domain of which is connected with the embedded layer, and the output formula of which is
Where x is an input value, y FM For FM layer output, w 0 For global bias, w is a weight parameter, the < w, x > parts represent the components of the first-order features in the model,the part represents the component of the second-order feature crossing in the model, n is the number of input domains, E i For inputting field x i The k order embedding vector.
The DNN layer of the model is a feedforward neural network, the input of which is a dense embedded vector of the output of the embedded layer, the output of which is represented as a (0) =[e 1 ,e 1 ,…,e k ]The DNN part of the operation process is
Where l denotes the current number of layers of the DNN model, a (l+1) For the current layer output, W (l) Is a weight, b (l) In order to be offset,is an activation function. The output of the DNN part is
Wherein, y DNN For partial output of DNN, | H | represents the number of hidden layer layers. By default, the DNN-layer network structure is a 4-layer hidden layer, and the number of neurons is 100, 64, 32, 8; the activation function is a ReLU function.
The model output layer is connected with the FM layer and the DNN layer, and a neuron is used for outputting a result, wherein the formula is
Wherein y is the model evaluation result, y FM Is the output of the FM layer, y DNN Is the output of the DNN layer.Is a ReLU activation function. The model loss function uses the mean absolute error.
Compared with the prior art, the invention has the advantages that:
1) the invention improves the accuracy of price evaluation of the second-hand vehicle;
2) according to the invention, the deep FM model is adopted to evaluate the price of the second-hand car, so that the workload of characteristic engineering is reduced, the low-order and high-order characteristic intersection of input data can be better captured, and the mapping relation between the price and the input characteristic is obtained;
3) the invention adopts mean value coding and principal component analysis to carry out data preprocessing, reduces the feature dimension and saves the memory and the operation time.
Drawings
Fig. 1 is a flowchart of a second-hand vehicle price evaluation method according to an embodiment of the present invention;
fig. 2 is a schematic structural diagram of a second-hand vehicle price evaluation method according to an embodiment of the invention;
fig. 3 is a schematic network structure diagram of a used vehicle price evaluation method according to an embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention will be further described in detail below with reference to the accompanying drawings by way of examples.
As shown in fig. 1 to 3, a used vehicle price evaluation method based on a deep fm model includes the specific steps of:
1) taking historical second-hand vehicle transaction data as input data;
the historical used vehicle transaction data comprises used vehicle attribute characteristic data x origin And a transaction price y, wherein the attribute characteristics comprise vehicle registration date, vehicle transaction date, vehicle type, vehicle brand, vehicle type, fuel type, transmission type, engine power, vehicle mileage, and vehicle region.
2) Performing feature segmentation on attribute features in the second-hand vehicle data, and segmenting the attribute features into numerical features, high-cardinal-number class features and low-cardinal-number class features;
segmenting attribute features in historical used cars into numerical features x num And class characteristics x cate ;
Dividing the class features according to the cardinality, wherein the cardinality is more than 10, and the class features are high cardinality high-cate Radix of less than or equal to 10 is a low radix class feature x low-cate 。
3) Respectively preprocessing the three characteristics of the second-hand vehicle;
preprocessing comprises data cleaning, missing value filling, feature coding, data standardization and data dimension reduction;
and (4) data cleaning is carried out by using a box line graph to remove abnormal values, and the maximum and minimum values in the data are removed.
Missing value filling means that the missing value of the category feature is filled by using the mode of the feature of all data, and the numerical feature is filled by using the mean value of the feature.
For high cardinality class features x high-cate Carrying out mean value coding, wherein the specific formula is as follows:
wherein g (y, x) i ) For the coded eigenvalues, y is the price of the second-hand vehicle, λ (n) i )∈[0,1]Reliability of two means, default valueIs 0.5, n i Is a characteristic value of x i N is the total number of samples,is x ═ x i The corresponding y-means value is then calculated,is the y-mean over the entire training set.
For low cardinality class features x low-cate Carrying out one-hot coding, which comprises the following steps:
let the used-hand car data have a class feature x with a base m i An m-dimensional vector x 'is constructed by dummy coding' 1 ,x' 1 ,…,x' m ]Each element of the vector corresponds to a feature x i The value of (1) indicates whether the current feature belongs to the numerical value, only one element in the vector is 1, and the others are 0. Replacement of original features x with encoded vectors x i 。
The data normalization adopts a normalization method, and the formula is as follows:
wherein x i Before normalization, x' is the value after normalization, and n is the number of samples. Replacing the original value x with a normalized value x i 。
4) Attribute feature x after preprocessing of the same used car num 、x low-cate 、x high-cate Arranged in a row to form a row vector x ═ x num ,x low-cate ,x high-cate ];
5) Arranging and splicing the data x of all the second-hand vehicles according to rows to form a second-hand vehicle data matrix;
6) carrying out data dimension reduction on numerical characteristics in the second-hand vehicle data matrix to obtain a data matrix X';
the numerical characteristic part in the second-hand vehicle data matrix X is A n×m Number ofAccording to the principal component analysis method for reducing the dimension, selecting data with the principal component accounting for 99 percent, and the method comprises the following specific steps: solving a numerical characteristic matrix A n×m Covariance matrix of
Wherein A is an original numerical characteristic matrix, n is the number of columns of the matrix A, and m is the number of rows of the matrix A;
calculating a covariance matrix C m×m Characteristic value (λ) of i ) i=0,…,m And a feature vector (p) i ) i=0,…,m (ii) a The eigenvalue lambda is arranged from large to small as { lambda 0 ,λ 1 ,…,λ m In which λ is 0 ≥λ 1 ≥…≥λ m . Selecting the first k characteristic values, wherein the sum of the characteristic values accounts for 99 percent of the sum of all the characteristic values, and the corresponding characteristic vector { p% 0 ,p 1 ,…,p k Are combined into a transformation matrix P k×m =[p 0 ,p 1 ,…,p k ]. Transforming the matrix P k×m And original second-hand car data X m×n Multiplying to obtain dimensionality reduction data A' n×k =[P k×m A n×m T ] T 。
Using dimensionality reduction rear matrix A' n×k In the replacement data A n×m And obtaining a dimension-reduced second-hand vehicle data matrix X'.
7) Splicing the second-hand car data price as a label to the tail of the corresponding second-hand car vector line;
8) constructing a deep FM model for evaluating the price of the second-hand vehicle;
the built DeepFM model comprises an input layer, an embedded layer, an FM layer, a DNN layer and an output layer from input to output in sequence. FIG. 3 is a block diagram of the constructed model.
The model input consists of multiple input fields, denoted X ═ F 1 ,F 2 ,…,F m ]. Wherein, the category feature fields respectively correspond to the low cardinality category features F preprocessed in the step 3 cate =[f 1 ,f 2 ,…,f i ]Numerical characteristic field correspondenceHigh cardinality class and numeric features F num =[f]. Each input field is connected with an embedding unit of the embedding layer, and is converted into an embedding vector e with the dimension k [ e ] after passing through the embedding layer 1 ,e 2 ,…,e k ]And k is 8. The embedded layer output is E ═ E 1 ,E 2 ,…,E k ]In which E i Embedding a vector E for k dimension i =[e 1,i ,e 2,i ,…,e k,i ]。
The model FM layer is a factorization machine, the first order unit of which is connected with each input domain, the second order unit domain of which is connected with the embedded layer, and the output formula of which isWherein, y FM For FM layer output, w 0 For global bias, the < w, x > part represents the component of the first-order feature in the model, w is the weight parameterThe part represents the component of the second-order feature crossing in the model, n is the number of input domains, E i As an input field F i The k order embedding vector.
The DNN layer of the model is a feedforward neural network, the input of the feedforward neural network is a dense embedded vector output by an embedded layer, the DNN layer network structure is a 4-layer hidden layer, and the number of neurons is 100, 64, 32 and 8 respectively; the activation function is a ReLU function. Denote the output of the embedding layer as a (0) =[e 1 ,e 1 ,…,e k ]The DNN part of the operation process is
Wherein l is the current layer number of the DNN model, a (l+1) For the current layer output, W (l) Is a weight, b (l) In order to be offset,is an activation function. The output of the DNN part isWherein, y DNN Representing the DNN model output, | H | representing the number of hidden layer layers.
The model output layer is connected with the FM layer and the DNN layer, and a neuron is used for outputting a result, wherein the formula isWherein y is the model evaluation result, y FM Is the output of the FM layer, y DNN Is the output of the DNN layer.Is a ReLU activation function. The model loss function uses the mean absolute error MAE, which is given by:
9) Carrying out steps 1) -7) on second-hand vehicle data used for model training, inputting the obtained second-hand vehicle data matrix into a deep FM model for training, and obtaining parameters of the model and a network model used for estimating price of the second-hand vehicle;
model training parameters: setting a network parameter adjustment algorithm as Back Propagation (BP), wherein an Adam optimizer is used by the optimizer; setting the learning rate to be 0.015, and gradually decreasing along with the iteration times; setting an L2 regular term coefficient of 0.08; set batch _ size to 2000 and epoch to 500.
10) Passing the used vehicle data to be estimated through 1) -6), inputting the obtained used vehicle data matrix into a deep FM model for estimating the price, wherein the output of an output layer of the deep FM model is the used vehicle price estimation value.
It will be appreciated by those of ordinary skill in the art that the examples described herein are intended to assist the reader in understanding the manner in which the invention is practiced, and it is to be understood that the scope of the invention is not limited to such specifically recited statements and examples. Those skilled in the art, having the benefit of this disclosure, may effect numerous modifications thereto and changes may be made without departing from the scope of the invention in its aspects.
Claims (5)
1. A used vehicle price evaluation method based on a deep FM model is characterized by comprising the following steps:
1) taking historical second-hand car transaction data as input data; the historical used vehicle transaction data comprises used vehicle attribute characteristic data x origin And transaction price y, wherein the attribute characteristics comprise vehicle registration date, vehicle transaction date, vehicle type, vehicle brand, vehicle type, fuel type, transmission type, engine power, vehicle mileage and vehicle region;
2) the attribute characteristics in the second-hand vehicle data are subjected to characteristic segmentation, and the attribute characteristics are segmented into three types of characteristics: a numerical feature, a high cardinality category feature, and a low cardinality category feature;
3) respectively preprocessing the three characteristics of the second-hand vehicle; the pretreatment comprises the following steps: data cleaning, missing value filling, feature coding and data standardization;
4) arranging the attribute characteristics of the same second-hand vehicle after preprocessing into a row to form a row vector x;
5) data x of all used cars are compared i Arranging and splicing the handcart data matrix according to rows to form a second-hand vehicle data matrix;
6) performing data dimension reduction on the numerical characteristics in the second-hand vehicle data matrix X to obtain a data matrix X';
7) splicing the second-hand car data price as a label to the tail of the corresponding second-hand car vector line;
8) constructing a deep FM model for evaluating the price of the second-hand vehicle;
9) carrying out steps 1) -7) on second-hand vehicle data used for model training, inputting the obtained second-hand vehicle data matrix into a deep FM model for training, and obtaining parameters of the model and a network model used for estimating price of the second-hand vehicle;
10) and (3) carrying out steps 1) to 6) on the used vehicle data to be estimated, and inputting the obtained used vehicle data matrix into a deep FM model for estimating the price.
2. The deep FM model-based used vehicle price assessment method according to claim 1, characterized in that: and 2, feature segmentation in the step 2, dividing the attribute features of the original used vehicle into numerical features and category features, and segmenting the category features according to the cardinality, wherein the cardinality larger than 10 is a high cardinality category feature, and the cardinality smaller than or equal to 10 is a low cardinality category feature.
3. The deep FM model-based used vehicle price assessment method according to claim 1, characterized in that: in the step 3, the data is cleaned, abnormal values are removed by using a box plot, and the maximum and minimum values in the data are removed;
missing value filling means that the category feature missing value is filled by using the mode of the feature of all data, and the numerical feature is filled by using the mean value of the feature;
the characteristic coding refers to mean value coding and one-hot coding; carrying out mean value coding on the high-cardinality classification characteristics, wherein the specific formula is as follows:
wherein g (y, x) i ) For the coded eigenvalues, y is the price of the second-hand vehicle, λ (n) i )∈[0,1]Default value of 0.5, n for reliability of two means i Is a characteristic value of x i N is the total number of samples,is x ═ x i The corresponding y-means value is then calculated,is y on the whole training setA value;
carrying out one-hot coding on the low-cardinality class characteristics by the following process:
setting the second-hand vehicle data to have a class characteristic x with a base number of m, constructing a sparse matrix A with n x m through dummy coding, wherein each column of the matrix corresponds to a value of the characteristic x, the numerical value of each column represents whether the characteristic belongs to the current characteristic, and replacing the original characteristic x with the coded sparse numerical matrix A;
the data normalization adopts a normalization method, and the formula is as follows:
wherein x is i Before normalization, x' is the value after normalization, and n is the number of samples.
4. The deep FM model-based used vehicle price assessment method according to claim 1, characterized in that: in the step 6, a principal component analysis method is adopted for data dimensionality reduction, and data with 99% of principal components is selected, and the specific steps are as follows:
first, a covariance matrix of original data X is obtained
Wherein X is an original data matrix, n is the number of columns of the matrix X, and m is the number of rows of the matrix X;
calculating a covariance matrix C m×m Characteristic value (λ) of i ) i=0,…,m And a feature vector (p) i ) i=0,…,m ;
The eigenvalue lambda is arranged from large to small as { lambda 0 ,λ 1 ,…,λ m In which λ is 0 ≥λ 1 ≥…≥λ m Taking the first k characteristic values, the sum of which accounts for 99 percent of the sum of all the characteristic values, and corresponding characteristic vector { p% 0 ,p 1 ,…,p k Are combined into transformationsMatrix array
P k×m =[p 0 ,p 1 ,…,p k ] T
Transforming the matrix P k×m And original second-hand car data X m×n Multiplying to obtain the data after dimensionality reduction
Y k×n =P k×m X m×n
Wherein, Y k×n To reduce the dimension of the post-matrix, P k×m For transforming the matrix, X m×n Is a matrix of raw data.
5. The method for evaluating the price of the used vehicle based on the deep FM model according to claim 1, characterized in that: the DeepFM model constructed in the step 8 sequentially comprises an input layer, an embedded layer, an FM layer, a DNN layer and an output layer from input to output; the deep FM model input consists of a plurality of input domains, and is divided into a category characteristic domain and a numerical characteristic domain; the category feature domain corresponds to the low-radix category features preprocessed in the step 3, and the numerical feature domain corresponds to the high-radix category features and the numerical features;
each input domain is connected with one embedding unit of the embedding layer, and is converted into an embedding vector with a dimension of k after passing through the embedding layer, wherein the default value of k is 8;
the model FM layer is a factorization machine, the first order unit of which is connected with each input domain, the second order unit domain of which is connected with the embedded layer, and the output formula of which is
Where x is an input value, y FM For FM layer output, w 0 For global bias, w is a weight parameter, the < w, x > parts represent the components of the first-order features in the model,the part represents the component of the second-order feature crossing in the model, n is the number of input domains, E i For inputting field x i K order embedding ofA vector;
the DNN layer of the model is a feedforward neural network, the input of which is a dense embedded vector of the output of the embedded layer, the output of which is represented as a (0) =[e 1 ,e 1 ,…,e k ]The DNN part of the operation process is
Where l denotes the current number of layers of the DNN model, a (l+1) For the current layer output, W (l) Is a weight, b (l) In order to be offset,is an activation function; the output of the DNN part is
Wherein, y DNN For partial output of DNN, | H | represents the number of hidden layers; by default, the DNN-layer network structure is a 4-layer hidden layer, and the number of neurons is 100, 64, 32, 8; the activation function is a ReLU function;
the model output layer is connected with the FM layer and the DNN layer, and a neuron is used for outputting a result, wherein the formula is
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210357696.8A CN114971675A (en) | 2022-04-06 | 2022-04-06 | Second-hand car price evaluation method based on deep FM model |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210357696.8A CN114971675A (en) | 2022-04-06 | 2022-04-06 | Second-hand car price evaluation method based on deep FM model |
Publications (1)
Publication Number | Publication Date |
---|---|
CN114971675A true CN114971675A (en) | 2022-08-30 |
Family
ID=82976718
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202210357696.8A Pending CN114971675A (en) | 2022-04-06 | 2022-04-06 | Second-hand car price evaluation method based on deep FM model |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN114971675A (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116167872A (en) * | 2023-04-20 | 2023-05-26 | 湖南工商大学 | Abnormal medical data detection method, device and equipment |
CN116451125A (en) * | 2023-06-02 | 2023-07-18 | 平安科技(深圳)有限公司 | New energy vehicle owner identification method, device, equipment and storage medium |
-
2022
- 2022-04-06 CN CN202210357696.8A patent/CN114971675A/en active Pending
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116167872A (en) * | 2023-04-20 | 2023-05-26 | 湖南工商大学 | Abnormal medical data detection method, device and equipment |
CN116451125A (en) * | 2023-06-02 | 2023-07-18 | 平安科技(深圳)有限公司 | New energy vehicle owner identification method, device, equipment and storage medium |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109060001B (en) | Multi-working-condition process soft measurement modeling method based on feature transfer learning | |
CN114971675A (en) | Second-hand car price evaluation method based on deep FM model | |
CN109584161A (en) | The Remote sensed image super-resolution reconstruction method of convolutional neural networks based on channel attention | |
CN113869208B (en) | Rolling bearing fault diagnosis method based on SA-ACWGAN-GP | |
CN114581560B (en) | Multi-scale neural network infrared image colorization method based on attention mechanism | |
Wu et al. | Optimized deep learning framework for water distribution data-driven modeling | |
CN113920379B (en) | Zero sample image classification method based on knowledge assistance | |
CN114912666A (en) | Short-time passenger flow volume prediction method based on CEEMDAN algorithm and attention mechanism | |
CN113762967A (en) | Risk information determination method, model training method, device, and program product | |
Zhang et al. | Kalman Filter-Based CNN-BiLSTM-ATT Model for Traffic Flow Prediction. | |
CN117392450A (en) | Steel material quality analysis method based on evolutionary multi-scale feature learning | |
CN117237663A (en) | Point cloud restoration method for large receptive field | |
CN116757255A (en) | Method for improving weight reduction of mobile NetV2 distracted driving behavior detection model | |
CN116341723A (en) | Stock trend prediction method, system, equipment and medium based on deep learning and multi-source data fusion | |
CN114036947B (en) | Small sample text classification method and system for semi-supervised learning | |
CN114611665A (en) | Multi-precision hierarchical quantization method and device based on weight oscillation influence degree | |
Yang et al. | SDiT: Spiking Diffusion Model with Transformer | |
CN114595890A (en) | Ship spare part demand prediction method and system based on BP-SVR combined model | |
Fu et al. | Learning a Model-Based Deep Hyperspectral Denoiser from a Single Noisy Hyperspectral Image | |
Nandal et al. | A Synergistic Framework Leveraging Autoencoders and Generative Adversarial Networks for the Synthesis of Computational Fluid Dynamics Results in Aerofoil Aerodynamics | |
Yin et al. | Used-Car Price Evaluation Using Mean Encoding and PCA based DeepFM | |
CN118504792B (en) | Charging station cluster load prediction method and system with exogenous variable depth fusion | |
CN116030637B (en) | Traffic state prediction integration method | |
CN117219124B (en) | Switch cabinet voiceprint fault detection method based on deep neural network | |
CN111428876B (en) | Image classification method of mixed cavity convolution neural network based on self-walking learning |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |