CN114971675A - Second-hand car price evaluation method based on deep FM model - Google Patents

Second-hand car price evaluation method based on deep FM model Download PDF

Info

Publication number
CN114971675A
CN114971675A CN202210357696.8A CN202210357696A CN114971675A CN 114971675 A CN114971675 A CN 114971675A CN 202210357696 A CN202210357696 A CN 202210357696A CN 114971675 A CN114971675 A CN 114971675A
Authority
CN
China
Prior art keywords
data
model
vehicle
layer
matrix
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210357696.8A
Other languages
Chinese (zh)
Inventor
肖文栋
尹旭阳
黄越
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
University of Science and Technology Beijing USTB
Shunde Graduate School of USTB
Original Assignee
University of Science and Technology Beijing USTB
Shunde Graduate School of USTB
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by University of Science and Technology Beijing USTB, Shunde Graduate School of USTB filed Critical University of Science and Technology Beijing USTB
Priority to CN202210357696.8A priority Critical patent/CN114971675A/en
Publication of CN114971675A publication Critical patent/CN114971675A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/02Marketing; Price estimation or determination; Fundraising
    • G06Q30/0278Product appraisal
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/10Pre-processing; Data cleansing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/213Feature extraction, e.g. by transforming the feature space; Summarisation; Mappings, e.g. subspace methods
    • G06F18/2135Feature extraction, e.g. by transforming the feature space; Summarisation; Mappings, e.g. subspace methods based on approximation criteria, e.g. principal component analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/044Recurrent networks, e.g. Hopfield networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/048Activation functions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • General Engineering & Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • Computational Linguistics (AREA)
  • Mathematical Physics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Business, Economics & Management (AREA)
  • Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Software Systems (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Finance (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Accounting & Taxation (AREA)
  • Development Economics (AREA)
  • Evolutionary Biology (AREA)
  • Strategic Management (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Game Theory and Decision Science (AREA)
  • Economics (AREA)
  • Marketing (AREA)
  • General Business, Economics & Management (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention discloses a second-hand car price evaluation method based on a deep FM model, which comprises the following steps: taking the second-hand car transaction data as input data; performing feature segmentation on attribute features in second-hand vehicle data; respectively preprocessing the three characteristics of the second-hand vehicle; arranging the attribute characteristics of the same second-hand vehicle after preprocessing into a row to form a row vector; arranging and splicing the data of all the second-hand vehicles according to rows to form a second-hand vehicle data matrix; performing data dimension reduction on numerical characteristics in the second-hand vehicle data matrix to obtain a data matrix; splicing the second-hand car data price as a label to the tail of the corresponding second-hand car price row; constructing a deep FM network; inputting the obtained second-hand vehicle data matrix into a deep FM model for training to obtain parameters of the model; and inputting the obtained second-hand vehicle data matrix into a deep FM model for price estimation. The invention has the advantages that: the accuracy of the price evaluation of the second-hand vehicle is improved, the workload is reduced, the feature dimension is reduced, and the memory and the operation time are saved.

Description

Second-hand car price evaluation method based on deep FM model
Technical Field
The invention relates to the technical field of price evaluation of second-hand vehicles, in particular to a price evaluation method of second-hand vehicles based on a deep FM model.
Background
Along with the improvement of the popularization rate of automobiles, the trading volume of second-hand cars is continuously improved, and the method has a wide development prospect. The evaluation of the value of the used cars is particularly important in the market of gradually-increased used cars trading. The traditional price evaluation method has the defects of dependence on market and experience of an evaluator, influence of subjective factors on evaluation results, high evaluation cost, low evaluation efficiency and the like. The conventional method is mainly adopted in the conventional second-hand car market, and the evaluation result is very dependent on personal subjective feeling. Therefore, an accurate and scientific used vehicle price prediction method is provided, the accuracy of used vehicle value prediction is improved, and the method has important significance for development of used vehicle industry. In recent years, machine learning methods have been used in attempts to evaluate used car prices as a reference price for used car transactions.
Deep learning is a novel machine learning method, which forms more abstract high-level representation attribute classes or features by combining low-level features to discover distributed feature representation of data. Deep learning structures such as Deep Neural Networks (DNNs), Convolutional Neural Networks (CNNs), and Recurrent Neural Networks (RNNs) have been successfully applied in the fields of computer vision, speech recognition, natural language processing, and the like. Compared with a shallow neural network, the deep neural network has the advantages that more layers provide higher abstract layers for the model, and the prediction capability of the model is improved. Aiming at complex vehicle types and regional conditions, the evaluation price of the second-hand vehicle is obtained by using a deep learning method, and the problems of dependence on market experience, dependence on subjective feeling, low evaluation efficiency and the like in price evaluation can be solved.
Disclosure of Invention
Aiming at the defects of the prior art, the invention provides a used vehicle price evaluation method based on a deep FM model.
In order to realize the purpose, the technical scheme adopted by the invention is as follows:
a used vehicle price evaluation method based on a deep FM model comprises the following steps:
1) taking historical second-hand car transaction data as input data; the historical used vehicle transaction data comprises used vehicle attribute characteristic data x origin And a transaction price y, wherein the attribute characteristics comprise vehicle registration date, vehicle transaction date, vehicle type, vehicle brand, vehicle type, fuel type, transmission type, engine power, vehicle mileage, and vehicle region.
2) The attribute characteristics in the second-hand vehicle data are subjected to characteristic segmentation, and the attribute characteristics are segmented into three types of characteristics: a numerical feature, a high cardinality category feature, and a low cardinality category feature;
3) respectively preprocessing the three characteristics of the second-hand vehicle; the pretreatment comprises the following steps: data cleaning, missing value filling, feature coding and data standardization;
4) arranging the attribute characteristics of the same second-hand vehicle after preprocessing into a row to form a row vector x;
5) data x of all used cars are compared i Arranging and splicing the handcart data matrix according to rows to form a second-hand vehicle data matrix;
6) performing data dimension reduction on the numerical characteristics in the second-hand vehicle data matrix X to obtain a data matrix X';
7) splicing the second-hand car data price as a label to the tail of the corresponding second-hand car vector line;
8) constructing a deep FM model for evaluating the price of the second-hand vehicle;
9) carrying out steps 1) -7) on second-hand car data used for model training, inputting the obtained second-hand car data matrix into a deep FM model for training, and obtaining parameters of the model and a network model used for estimating price of the second-hand car;
10) and (3) carrying out steps 1) to 6) on the used vehicle data to be estimated, and inputting the obtained used vehicle data matrix into a deep FM model for estimating the price.
Further, in the step 2, the characteristic segmentation divides the attribute characteristics of the original used vehicle into numerical characteristics and category characteristics, and the category characteristics are segmented according to the cardinality, wherein the cardinality larger than 10 is high cardinality category characteristics, and the cardinality smaller than or equal to 10 is low cardinality category characteristics.
Further, in step 3
And (4) data cleaning is carried out by using a box line graph to remove abnormal values, and the maximum and minimum values in the data are removed.
Missing value filling means that the missing value of the category feature is filled by using the mode of the feature of all data, and the numerical feature is filled by using the mean value of the feature.
Feature encoding refers to mean encoding and one-hot encoding. Carrying out mean value coding on the high-cardinality classification characteristics, wherein the specific formula is as follows:
Figure BDA0003582626230000031
wherein g (y, x) i ) For the coded eigenvalues, y is the price of the second-hand vehicle, λ (n) i )∈[0,1]Default value of 0.5, n for reliability of two means i Is a characteristic value of x i N is the total number of samples,
Figure BDA0003582626230000032
is x ═ x i The corresponding y-means value is then calculated,
Figure BDA0003582626230000033
is the y mean value over the entire training set;
carrying out one-hot coding on the low-cardinality class characteristics by the following process:
the second-hand vehicle data is provided with a class characteristic x with a base number of m, an n-m sparse matrix A is constructed through dummy coding, each column of the matrix corresponds to a value of the characteristic x, the numerical value of each column indicates whether the characteristic belongs to the current characteristic, and the original characteristic x is replaced by the coded sparse numerical value matrix A.
The data normalization adopts a normalization method, and the formula is as follows:
Figure BDA0003582626230000034
wherein x i Before normalization, x' is the value after normalization, and n is the number of samples.
Further, in the step 6, a principal component analysis method is adopted for data dimensionality reduction, and data with 99% of principal components is selected, and the specific steps are as follows:
first, a covariance matrix of original data X is obtained
Figure BDA0003582626230000041
Wherein X is an original data matrix, n is the number of columns of the matrix X, and m is the number of rows of the matrix X;
calculating a covariance matrix C m×m Characteristic value (λ) of i ) i=0,…,m And a feature vector (p) i ) i=0,…,m
The eigenvalue lambda is arranged from large to small as { lambda 01 ,…,λ m In which λ is 0 ≥λ 1 ≥…≥λ m Taking the first k characteristic values, wherein the sum of the characteristic values accounts for 99 percent of the sum of all the characteristic values, and corresponding characteristic vectors { p% 0 ,p 1 ,…,p k Are combined into a transform matrix
P k×m =[p 0 ,p 1 ,…,p k ] T
Transforming the matrix P k×m And original second-hand car data X m×n Multiplying to obtain the data after dimensionality reduction
Y k×n =P k×m X m×n
Wherein, Y k×n To reduce the dimension of the post-matrix, P k×m For transforming the matrix, X m×n Is a matrix of raw data.
Further, the deep FM model constructed in step 8 includes an input layer, an embedded layer, an FM layer, a DNN layer, and an output layer in order from input to output. The deep FM model input is composed of a plurality of input fields, and is divided into a category characteristic field and a numerical characteristic field. The category feature domain corresponds to the low-radix category feature preprocessed in the step 3, and the numerical feature domain corresponds to the high-radix category feature and the numerical feature.
Each input field is connected with an embedding unit of the embedding layer, and is converted into an embedding vector with a dimension k after passing through the embedding layer, wherein the default value of k is 8.
The model FM layer is a factorization machine, the first order unit of which is connected with each input domain, the second order unit domain of which is connected with the embedded layer, and the output formula of which is
Figure BDA0003582626230000042
Where x is an input value, y FM For FM layer output, w 0 For global bias, w is a weight parameter, the < w, x > parts represent the components of the first-order features in the model,
Figure BDA0003582626230000051
the part represents the component of the second-order feature crossing in the model, n is the number of input domains, E i For inputting field x i The k order embedding vector.
The DNN layer of the model is a feedforward neural network, the input of which is a dense embedded vector of the output of the embedded layer, the output of which is represented as a (0) =[e 1 ,e 1 ,…,e k ]The DNN part of the operation process is
Figure BDA0003582626230000052
Where l denotes the current number of layers of the DNN model, a (l+1) For the current layer output, W (l) Is a weight, b (l) In order to be offset,
Figure BDA0003582626230000053
is an activation function. The output of the DNN part is
Figure BDA0003582626230000054
Wherein, y DNN For partial output of DNN, | H | represents the number of hidden layer layers. By default, the DNN-layer network structure is a 4-layer hidden layer, and the number of neurons is 100, 64, 32, 8; the activation function is a ReLU function.
The model output layer is connected with the FM layer and the DNN layer, and a neuron is used for outputting a result, wherein the formula is
Figure BDA0003582626230000055
Wherein y is the model evaluation result, y FM Is the output of the FM layer, y DNN Is the output of the DNN layer.
Figure BDA0003582626230000056
Is a ReLU activation function. The model loss function uses the mean absolute error.
Compared with the prior art, the invention has the advantages that:
1) the invention improves the accuracy of price evaluation of the second-hand vehicle;
2) according to the invention, the deep FM model is adopted to evaluate the price of the second-hand car, so that the workload of characteristic engineering is reduced, the low-order and high-order characteristic intersection of input data can be better captured, and the mapping relation between the price and the input characteristic is obtained;
3) the invention adopts mean value coding and principal component analysis to carry out data preprocessing, reduces the feature dimension and saves the memory and the operation time.
Drawings
Fig. 1 is a flowchart of a second-hand vehicle price evaluation method according to an embodiment of the present invention;
fig. 2 is a schematic structural diagram of a second-hand vehicle price evaluation method according to an embodiment of the invention;
fig. 3 is a schematic network structure diagram of a used vehicle price evaluation method according to an embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention will be further described in detail below with reference to the accompanying drawings by way of examples.
As shown in fig. 1 to 3, a used vehicle price evaluation method based on a deep fm model includes the specific steps of:
1) taking historical second-hand vehicle transaction data as input data;
the historical used vehicle transaction data comprises used vehicle attribute characteristic data x origin And a transaction price y, wherein the attribute characteristics comprise vehicle registration date, vehicle transaction date, vehicle type, vehicle brand, vehicle type, fuel type, transmission type, engine power, vehicle mileage, and vehicle region.
2) Performing feature segmentation on attribute features in the second-hand vehicle data, and segmenting the attribute features into numerical features, high-cardinal-number class features and low-cardinal-number class features;
segmenting attribute features in historical used cars into numerical features x num And class characteristics x cate
Dividing the class features according to the cardinality, wherein the cardinality is more than 10, and the class features are high cardinality high-cate Radix of less than or equal to 10 is a low radix class feature x low-cate
3) Respectively preprocessing the three characteristics of the second-hand vehicle;
preprocessing comprises data cleaning, missing value filling, feature coding, data standardization and data dimension reduction;
and (4) data cleaning is carried out by using a box line graph to remove abnormal values, and the maximum and minimum values in the data are removed.
Missing value filling means that the missing value of the category feature is filled by using the mode of the feature of all data, and the numerical feature is filled by using the mean value of the feature.
For high cardinality class features x high-cate Carrying out mean value coding, wherein the specific formula is as follows:
Figure BDA0003582626230000071
wherein g (y, x) i ) For the coded eigenvalues, y is the price of the second-hand vehicle, λ (n) i )∈[0,1]Reliability of two means, default valueIs 0.5, n i Is a characteristic value of x i N is the total number of samples,
Figure BDA0003582626230000072
is x ═ x i The corresponding y-means value is then calculated,
Figure BDA0003582626230000073
is the y-mean over the entire training set.
For low cardinality class features x low-cate Carrying out one-hot coding, which comprises the following steps:
let the used-hand car data have a class feature x with a base m i An m-dimensional vector x 'is constructed by dummy coding' 1 ,x' 1 ,…,x' m ]Each element of the vector corresponds to a feature x i The value of (1) indicates whether the current feature belongs to the numerical value, only one element in the vector is 1, and the others are 0. Replacement of original features x with encoded vectors x i
The data normalization adopts a normalization method, and the formula is as follows:
Figure BDA0003582626230000074
wherein x i Before normalization, x' is the value after normalization, and n is the number of samples. Replacing the original value x with a normalized value x i
4) Attribute feature x after preprocessing of the same used car num 、x low-cate 、x high-cate Arranged in a row to form a row vector x ═ x num ,x low-cate ,x high-cate ];
5) Arranging and splicing the data x of all the second-hand vehicles according to rows to form a second-hand vehicle data matrix;
6) carrying out data dimension reduction on numerical characteristics in the second-hand vehicle data matrix to obtain a data matrix X';
the numerical characteristic part in the second-hand vehicle data matrix X is A n×m Number ofAccording to the principal component analysis method for reducing the dimension, selecting data with the principal component accounting for 99 percent, and the method comprises the following specific steps: solving a numerical characteristic matrix A n×m Covariance matrix of
Figure BDA0003582626230000081
Wherein A is an original numerical characteristic matrix, n is the number of columns of the matrix A, and m is the number of rows of the matrix A;
calculating a covariance matrix C m×m Characteristic value (λ) of i ) i=0,…,m And a feature vector (p) i ) i=0,…,m (ii) a The eigenvalue lambda is arranged from large to small as { lambda 01 ,…,λ m In which λ is 0 ≥λ 1 ≥…≥λ m . Selecting the first k characteristic values, wherein the sum of the characteristic values accounts for 99 percent of the sum of all the characteristic values, and the corresponding characteristic vector { p% 0 ,p 1 ,…,p k Are combined into a transformation matrix P k×m =[p 0 ,p 1 ,…,p k ]. Transforming the matrix P k×m And original second-hand car data X m×n Multiplying to obtain dimensionality reduction data A' n×k =[P k×m A n×m T ] T
Using dimensionality reduction rear matrix A' n×k In the replacement data A n×m And obtaining a dimension-reduced second-hand vehicle data matrix X'.
7) Splicing the second-hand car data price as a label to the tail of the corresponding second-hand car vector line;
8) constructing a deep FM model for evaluating the price of the second-hand vehicle;
the built DeepFM model comprises an input layer, an embedded layer, an FM layer, a DNN layer and an output layer from input to output in sequence. FIG. 3 is a block diagram of the constructed model.
The model input consists of multiple input fields, denoted X ═ F 1 ,F 2 ,…,F m ]. Wherein, the category feature fields respectively correspond to the low cardinality category features F preprocessed in the step 3 cate =[f 1 ,f 2 ,…,f i ]Numerical characteristic field correspondenceHigh cardinality class and numeric features F num =[f]. Each input field is connected with an embedding unit of the embedding layer, and is converted into an embedding vector e with the dimension k [ e ] after passing through the embedding layer 1 ,e 2 ,…,e k ]And k is 8. The embedded layer output is E ═ E 1 ,E 2 ,…,E k ]In which E i Embedding a vector E for k dimension i =[e 1,i ,e 2,i ,…,e k,i ]。
The model FM layer is a factorization machine, the first order unit of which is connected with each input domain, the second order unit domain of which is connected with the embedded layer, and the output formula of which is
Figure BDA0003582626230000091
Wherein, y FM For FM layer output, w 0 For global bias, the < w, x > part represents the component of the first-order feature in the model, w is the weight parameter
Figure BDA0003582626230000092
The part represents the component of the second-order feature crossing in the model, n is the number of input domains, E i As an input field F i The k order embedding vector.
The DNN layer of the model is a feedforward neural network, the input of the feedforward neural network is a dense embedded vector output by an embedded layer, the DNN layer network structure is a 4-layer hidden layer, and the number of neurons is 100, 64, 32 and 8 respectively; the activation function is a ReLU function. Denote the output of the embedding layer as a (0) =[e 1 ,e 1 ,…,e k ]The DNN part of the operation process is
Figure BDA0003582626230000093
Wherein l is the current layer number of the DNN model, a (l+1) For the current layer output, W (l) Is a weight, b (l) In order to be offset,
Figure BDA0003582626230000094
is an activation function. The output of the DNN part is
Figure BDA0003582626230000095
Wherein, y DNN Representing the DNN model output, | H | representing the number of hidden layer layers.
The model output layer is connected with the FM layer and the DNN layer, and a neuron is used for outputting a result, wherein the formula is
Figure BDA0003582626230000096
Wherein y is the model evaluation result, y FM Is the output of the FM layer, y DNN Is the output of the DNN layer.
Figure BDA0003582626230000097
Is a ReLU activation function. The model loss function uses the mean absolute error MAE, which is given by:
Figure BDA0003582626230000098
wherein, y is a true value,
Figure BDA0003582626230000099
is the model prediction value, and n is the prediction sample number.
9) Carrying out steps 1) -7) on second-hand vehicle data used for model training, inputting the obtained second-hand vehicle data matrix into a deep FM model for training, and obtaining parameters of the model and a network model used for estimating price of the second-hand vehicle;
model training parameters: setting a network parameter adjustment algorithm as Back Propagation (BP), wherein an Adam optimizer is used by the optimizer; setting the learning rate to be 0.015, and gradually decreasing along with the iteration times; setting an L2 regular term coefficient of 0.08; set batch _ size to 2000 and epoch to 500.
10) Passing the used vehicle data to be estimated through 1) -6), inputting the obtained used vehicle data matrix into a deep FM model for estimating the price, wherein the output of an output layer of the deep FM model is the used vehicle price estimation value.
It will be appreciated by those of ordinary skill in the art that the examples described herein are intended to assist the reader in understanding the manner in which the invention is practiced, and it is to be understood that the scope of the invention is not limited to such specifically recited statements and examples. Those skilled in the art, having the benefit of this disclosure, may effect numerous modifications thereto and changes may be made without departing from the scope of the invention in its aspects.

Claims (5)

1. A used vehicle price evaluation method based on a deep FM model is characterized by comprising the following steps:
1) taking historical second-hand car transaction data as input data; the historical used vehicle transaction data comprises used vehicle attribute characteristic data x origin And transaction price y, wherein the attribute characteristics comprise vehicle registration date, vehicle transaction date, vehicle type, vehicle brand, vehicle type, fuel type, transmission type, engine power, vehicle mileage and vehicle region;
2) the attribute characteristics in the second-hand vehicle data are subjected to characteristic segmentation, and the attribute characteristics are segmented into three types of characteristics: a numerical feature, a high cardinality category feature, and a low cardinality category feature;
3) respectively preprocessing the three characteristics of the second-hand vehicle; the pretreatment comprises the following steps: data cleaning, missing value filling, feature coding and data standardization;
4) arranging the attribute characteristics of the same second-hand vehicle after preprocessing into a row to form a row vector x;
5) data x of all used cars are compared i Arranging and splicing the handcart data matrix according to rows to form a second-hand vehicle data matrix;
6) performing data dimension reduction on the numerical characteristics in the second-hand vehicle data matrix X to obtain a data matrix X';
7) splicing the second-hand car data price as a label to the tail of the corresponding second-hand car vector line;
8) constructing a deep FM model for evaluating the price of the second-hand vehicle;
9) carrying out steps 1) -7) on second-hand vehicle data used for model training, inputting the obtained second-hand vehicle data matrix into a deep FM model for training, and obtaining parameters of the model and a network model used for estimating price of the second-hand vehicle;
10) and (3) carrying out steps 1) to 6) on the used vehicle data to be estimated, and inputting the obtained used vehicle data matrix into a deep FM model for estimating the price.
2. The deep FM model-based used vehicle price assessment method according to claim 1, characterized in that: and 2, feature segmentation in the step 2, dividing the attribute features of the original used vehicle into numerical features and category features, and segmenting the category features according to the cardinality, wherein the cardinality larger than 10 is a high cardinality category feature, and the cardinality smaller than or equal to 10 is a low cardinality category feature.
3. The deep FM model-based used vehicle price assessment method according to claim 1, characterized in that: in the step 3, the data is cleaned, abnormal values are removed by using a box plot, and the maximum and minimum values in the data are removed;
missing value filling means that the category feature missing value is filled by using the mode of the feature of all data, and the numerical feature is filled by using the mean value of the feature;
the characteristic coding refers to mean value coding and one-hot coding; carrying out mean value coding on the high-cardinality classification characteristics, wherein the specific formula is as follows:
Figure FDA0003582626220000021
wherein g (y, x) i ) For the coded eigenvalues, y is the price of the second-hand vehicle, λ (n) i )∈[0,1]Default value of 0.5, n for reliability of two means i Is a characteristic value of x i N is the total number of samples,
Figure FDA0003582626220000022
is x ═ x i The corresponding y-means value is then calculated,
Figure FDA0003582626220000023
is y on the whole training setA value;
carrying out one-hot coding on the low-cardinality class characteristics by the following process:
setting the second-hand vehicle data to have a class characteristic x with a base number of m, constructing a sparse matrix A with n x m through dummy coding, wherein each column of the matrix corresponds to a value of the characteristic x, the numerical value of each column represents whether the characteristic belongs to the current characteristic, and replacing the original characteristic x with the coded sparse numerical matrix A;
the data normalization adopts a normalization method, and the formula is as follows:
Figure FDA0003582626220000024
wherein x is i Before normalization, x' is the value after normalization, and n is the number of samples.
4. The deep FM model-based used vehicle price assessment method according to claim 1, characterized in that: in the step 6, a principal component analysis method is adopted for data dimensionality reduction, and data with 99% of principal components is selected, and the specific steps are as follows:
first, a covariance matrix of original data X is obtained
Figure FDA0003582626220000031
Wherein X is an original data matrix, n is the number of columns of the matrix X, and m is the number of rows of the matrix X;
calculating a covariance matrix C m×m Characteristic value (λ) of i ) i=0,…,m And a feature vector (p) i ) i=0,…,m
The eigenvalue lambda is arranged from large to small as { lambda 01 ,…,λ m In which λ is 0 ≥λ 1 ≥…≥λ m Taking the first k characteristic values, the sum of which accounts for 99 percent of the sum of all the characteristic values, and corresponding characteristic vector { p% 0 ,p 1 ,…,p k Are combined into transformationsMatrix array
P k×m =[p 0 ,p 1 ,…,p k ] T
Transforming the matrix P k×m And original second-hand car data X m×n Multiplying to obtain the data after dimensionality reduction
Y k×n =P k×m X m×n
Wherein, Y k×n To reduce the dimension of the post-matrix, P k×m For transforming the matrix, X m×n Is a matrix of raw data.
5. The method for evaluating the price of the used vehicle based on the deep FM model according to claim 1, characterized in that: the DeepFM model constructed in the step 8 sequentially comprises an input layer, an embedded layer, an FM layer, a DNN layer and an output layer from input to output; the deep FM model input consists of a plurality of input domains, and is divided into a category characteristic domain and a numerical characteristic domain; the category feature domain corresponds to the low-radix category features preprocessed in the step 3, and the numerical feature domain corresponds to the high-radix category features and the numerical features;
each input domain is connected with one embedding unit of the embedding layer, and is converted into an embedding vector with a dimension of k after passing through the embedding layer, wherein the default value of k is 8;
the model FM layer is a factorization machine, the first order unit of which is connected with each input domain, the second order unit domain of which is connected with the embedded layer, and the output formula of which is
Figure FDA0003582626220000041
Where x is an input value, y FM For FM layer output, w 0 For global bias, w is a weight parameter, the < w, x > parts represent the components of the first-order features in the model,
Figure FDA0003582626220000042
the part represents the component of the second-order feature crossing in the model, n is the number of input domains, E i For inputting field x i K order embedding ofA vector;
the DNN layer of the model is a feedforward neural network, the input of which is a dense embedded vector of the output of the embedded layer, the output of which is represented as a (0) =[e 1 ,e 1 ,…,e k ]The DNN part of the operation process is
Figure FDA0003582626220000043
Where l denotes the current number of layers of the DNN model, a (l+1) For the current layer output, W (l) Is a weight, b (l) In order to be offset,
Figure FDA0003582626220000044
is an activation function; the output of the DNN part is
Figure FDA0003582626220000045
Wherein, y DNN For partial output of DNN, | H | represents the number of hidden layers; by default, the DNN-layer network structure is a 4-layer hidden layer, and the number of neurons is 100, 64, 32, 8; the activation function is a ReLU function;
the model output layer is connected with the FM layer and the DNN layer, and a neuron is used for outputting a result, wherein the formula is
Figure FDA0003582626220000046
Wherein y is the model evaluation result, y FM Is the output of the FM layer, y DNN Is the output of the DNN layer;
Figure FDA0003582626220000047
activating a function for the ReLU; the model loss function uses the mean absolute error.
CN202210357696.8A 2022-04-06 2022-04-06 Second-hand car price evaluation method based on deep FM model Pending CN114971675A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210357696.8A CN114971675A (en) 2022-04-06 2022-04-06 Second-hand car price evaluation method based on deep FM model

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210357696.8A CN114971675A (en) 2022-04-06 2022-04-06 Second-hand car price evaluation method based on deep FM model

Publications (1)

Publication Number Publication Date
CN114971675A true CN114971675A (en) 2022-08-30

Family

ID=82976718

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210357696.8A Pending CN114971675A (en) 2022-04-06 2022-04-06 Second-hand car price evaluation method based on deep FM model

Country Status (1)

Country Link
CN (1) CN114971675A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116167872A (en) * 2023-04-20 2023-05-26 湖南工商大学 Abnormal medical data detection method, device and equipment
CN116451125A (en) * 2023-06-02 2023-07-18 平安科技(深圳)有限公司 New energy vehicle owner identification method, device, equipment and storage medium

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116167872A (en) * 2023-04-20 2023-05-26 湖南工商大学 Abnormal medical data detection method, device and equipment
CN116451125A (en) * 2023-06-02 2023-07-18 平安科技(深圳)有限公司 New energy vehicle owner identification method, device, equipment and storage medium

Similar Documents

Publication Publication Date Title
CN109060001B (en) Multi-working-condition process soft measurement modeling method based on feature transfer learning
CN114971675A (en) Second-hand car price evaluation method based on deep FM model
CN109584161A (en) The Remote sensed image super-resolution reconstruction method of convolutional neural networks based on channel attention
CN113869208B (en) Rolling bearing fault diagnosis method based on SA-ACWGAN-GP
CN114581560B (en) Multi-scale neural network infrared image colorization method based on attention mechanism
Wu et al. Optimized deep learning framework for water distribution data-driven modeling
CN113920379B (en) Zero sample image classification method based on knowledge assistance
CN114912666A (en) Short-time passenger flow volume prediction method based on CEEMDAN algorithm and attention mechanism
CN113762967A (en) Risk information determination method, model training method, device, and program product
Zhang et al. Kalman Filter-Based CNN-BiLSTM-ATT Model for Traffic Flow Prediction.
CN117392450A (en) Steel material quality analysis method based on evolutionary multi-scale feature learning
CN117237663A (en) Point cloud restoration method for large receptive field
CN116757255A (en) Method for improving weight reduction of mobile NetV2 distracted driving behavior detection model
CN116341723A (en) Stock trend prediction method, system, equipment and medium based on deep learning and multi-source data fusion
CN114036947B (en) Small sample text classification method and system for semi-supervised learning
CN114611665A (en) Multi-precision hierarchical quantization method and device based on weight oscillation influence degree
Yang et al. SDiT: Spiking Diffusion Model with Transformer
CN114595890A (en) Ship spare part demand prediction method and system based on BP-SVR combined model
Fu et al. Learning a Model-Based Deep Hyperspectral Denoiser from a Single Noisy Hyperspectral Image
Nandal et al. A Synergistic Framework Leveraging Autoencoders and Generative Adversarial Networks for the Synthesis of Computational Fluid Dynamics Results in Aerofoil Aerodynamics
Yin et al. Used-Car Price Evaluation Using Mean Encoding and PCA based DeepFM
CN118504792B (en) Charging station cluster load prediction method and system with exogenous variable depth fusion
CN116030637B (en) Traffic state prediction integration method
CN117219124B (en) Switch cabinet voiceprint fault detection method based on deep neural network
CN111428876B (en) Image classification method of mixed cavity convolution neural network based on self-walking learning

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination