CN112506907A - Engineering machinery marketing strategy pushing method, system and device based on big data - Google Patents

Engineering machinery marketing strategy pushing method, system and device based on big data Download PDF

Info

Publication number
CN112506907A
CN112506907A CN202011454527.3A CN202011454527A CN112506907A CN 112506907 A CN112506907 A CN 112506907A CN 202011454527 A CN202011454527 A CN 202011454527A CN 112506907 A CN112506907 A CN 112506907A
Authority
CN
China
Prior art keywords
data
indexes
initial
marketing strategy
index
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202011454527.3A
Other languages
Chinese (zh)
Inventor
张善睿
张琳
邓波
曹金飞
马笑
李维
于桂美
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beigu Electronics Co ltd Shanghai Branch
North Valley Electronics Co ltd
Original Assignee
Beigu Electronics Co ltd Shanghai Branch
North Valley Electronics Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beigu Electronics Co ltd Shanghai Branch, North Valley Electronics Co ltd filed Critical Beigu Electronics Co ltd Shanghai Branch
Priority to CN202011454527.3A priority Critical patent/CN112506907A/en
Publication of CN112506907A publication Critical patent/CN112506907A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/21Design, administration or maintenance of databases
    • G06F16/215Improving data quality; Data cleansing, e.g. de-duplication, removing invalid entries or correcting typographical errors
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2458Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
    • G06F16/2465Query processing support for facilitating data mining operations in structured databases
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/02Marketing; Price estimation or determination; Fundraising
    • G06Q30/0201Market modelling; Market analysis; Collecting market data
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2216/00Indexing scheme relating to additional aspects of information retrieval not explicitly covered by G06F16/00 and subgroups
    • G06F2216/03Data mining

Abstract

The invention relates to the technical field of big data, and provides a method, a system and a device for pushing an engineering machinery marketing strategy based on big data, wherein the method comprises the following steps: acquiring initial data of a sample based on a preset big data platform, and determining a corresponding initial index according to the initial data; carrying out data cleaning and variable screening on the initial indexes to improve the data quality of the initial indexes to obtain filtering indexes; performing data mining on the filtering indexes through a preset data mining model to obtain key indexes in the filtering indexes; and setting a corresponding marketing strategy based on the key indexes, and pushing the marketing strategy to a corresponding client. The engineering machinery marketing strategy pushing method based on big data solves the problems of low competitiveness and poor marketing quality of a traditional engineering equipment marketing mode.

Description

Engineering machinery marketing strategy pushing method, system and device based on big data
Technical Field
The invention relates to the technical field of data mining, in particular to a method, a system, a device and a storage medium for pushing an engineering machinery marketing strategy based on big data.
Background
In recent years, with the increase of infrastructure investment, the sales of engineering equipment (such as a loader and the like) is gradually recovered, and the market is expected to be good. Along with the rise of the big data era and the more mature of consumers, the big data plays more and more important roles in marketing, research and development and the like, the data is becoming the latest competitive advantage resource in the world, and more interaction with the user is required to be generated so as to increase the user experience degree and the satisfaction degree.
According to the investigation condition of a host factory and a dealer, the conventional marketing mode is still used as an engineering equipment machine (such as a loader and the like) at present, and a plurality of problems are inevitable. The method mainly comprises the following steps: the method is characterized in that subjective experience marketing is carried out, and a host factory and each dealer make marketing policies and marketing activities in each region according to historical sales conditions, actual capacity of local markets, coverage rate and the like. Business personnel develop users according to the local engineering start situation, and the method has the advantages of low subjectivity conversion rate and no key pertinence problem; secondly, extensive marketing, namely judging whether a customer purchases or purchases will from the perspective of personal experience of service personnel based on the relationship with the customer, wherein the number of the customers tracked by each service personnel in half a year is between 150 and 300, so that the problems of low judgment accuracy, user dislike, lack of pertinence and pertinence in tracking and the like exist; thirdly, the user data cannot be fully utilized, the capacity of distinguishing active clients from inactive clients is insufficient, and abundant big data resources such as sales data, machine data, financing data and the like cannot be fully utilized; and fourthly, the actual working condition of the vehicle of the customer cannot be combined, the actual working condition of the vehicle of the customer has a great influence on the purchase of the customer, for example, the fatigue strength of the vehicle which works for more than 12 hours each day is large, and the replacement requirement can be generated in the case of about two years generally. Moreover, under different working conditions, such as coal mine, iron ore, municipal engineering and the like, the time points of purchasing machines by clients are obviously different.
Through the introduction of the traditional marketing mode, various traditional marketing modes lose competitiveness, the prior art cannot meet the requirements, large-scale development and utilization of data stock resources are not facilitated, and the indexes and the importance degree of accurate marketing cannot be judged, analyzed and judged.
Based on the above problems, a method for significantly improving the marketing quality of engineering equipment such as loaders is needed.
Disclosure of Invention
The invention provides a big data-based engineering machinery marketing strategy pushing method, a big data-based engineering machinery marketing strategy pushing system, an electronic device and a computer storage medium, and mainly aims to solve the problems of low competitiveness and poor marketing quality of a traditional engineering equipment marketing mode.
In order to achieve the above object, the present invention provides a big data based engineering machinery marketing strategy pushing method, including:
acquiring initial data of a sample based on a preset big data platform, and determining a corresponding initial index according to the initial data;
carrying out data cleaning and variable screening on the initial indexes to improve the data quality of the initial indexes to obtain filtering indexes;
performing data mining on the filtering indexes through a preset data mining model to obtain key indexes in the filtering indexes;
and setting a corresponding marketing strategy based on the key indexes, and pushing the marketing strategy to a corresponding client.
Preferably, the process of performing data cleaning and variable screening on the initial index comprises:
and processing the initial index through compliance analysis, univariate analysis, multivariate analysis and feature selection in sequence to realize data cleaning and variable screening of the initial index and obtain the filtration index with improved data quality.
Preferably, the process of compliance analysis comprises:
analyzing whether the data source of the initial index is accurate or not; if the accuracy is correct, no processing is carried out; if not, updating the data source of the initial index;
analyzing whether the calculation rule of the initial index is in compliance; if the standard is met, no treatment is carried out; and if the initial indexes are not in compliance, updating the calculation rules of the initial indexes or rejecting the initial indexes.
Preferably, the process of univariate analysis comprises:
index deletion: the initial indexes with 50% or more of deficiency or null values are excluded, and the initial indexes with deficiency but less than 50% of deficiency are subjected to data supplement by a deficiency value interpolation method;
univariate outlier analysis: judging the percentage of the data quantity exceeding 2 times of the standard deviation in each initial index to the total index quantity;
if the percentage exceeds 10%, the initial indicator is marked as an outlier and rejected.
Preferably, the process of multivariate analysis comprises:
and eliminating the indexes which belong to the same type and have high correlation or easily cause multiple collinearity in the initial indexes through a correlation analysis matrix and a PCA method.
Preferably, the data mining model is a Logistic regression model.
Preferably, in the process of presetting the Logistic regression model,
and constructing the Logistic regression model by a backward stepping method and a Sigmoid function.
In another aspect, the present invention further provides a big data based engineering machinery marketing strategy pushing system, where the system includes:
the system comprises an initial index acquisition unit, a data acquisition unit and a data processing unit, wherein the initial index acquisition unit is used for acquiring initial data of a sample based on a preset big data platform and determining a corresponding initial index according to the initial data;
the filtering unit is used for carrying out data cleaning and variable screening on the initial indexes so as to improve the data quality of the initial indexes to obtain filtering indexes;
the key index mining unit is used for performing data mining on the filtering indexes through a preset data mining model so as to obtain key indexes in the filtering indexes;
and the marketing strategy pushing unit is used for setting a corresponding marketing strategy based on the key indexes and pushing the marketing strategy to a corresponding client.
In another aspect, the present invention also provides an electronic device, including: a memory, a processor, and a big data based work machine marketing strategy pushing program stored in the memory and executable on the processor, the big data based work machine marketing strategy pushing program when executed by the processor implementing the steps of:
acquiring initial data of a sample based on a preset big data platform, and determining a corresponding initial index according to the initial data;
carrying out data cleaning and variable screening on the initial indexes to improve the data quality of the initial indexes to obtain filtering indexes;
performing data mining on the filtering indexes through a preset data mining model to obtain key indexes in the filtering indexes;
and setting a corresponding marketing strategy based on the key indexes, and pushing the marketing strategy to a corresponding client.
In another aspect, the present invention further provides a computer-readable storage medium, where a big data-based engineering machinery marketing strategy pushing program is stored in the computer-readable storage medium, and when the big data-based engineering machinery marketing strategy pushing program is executed by a processor, the steps of the big data-based engineering machinery marketing strategy pushing method are implemented.
The engineering machinery marketing strategy pushing method based on big data, the electronic device and the computer readable storage medium can automatically acquire and process data according to configuration rules. The data is rapidly cleaned in multiple dimensions, so that the data is standardized and unified, and the data quality is effectively improved. The key indexes of accurate marketing are mined by a data mining method, and the importance is ranked, so that the method is clear at a glance. The model is applied to accurate marketing, user portrait and poor quality delimitation, and an accurate marketing list and diagnosed problems are pushed to a host factory and various dealers.
In addition, the invention can automatically mine key factors influencing the purchase of engineering equipment by the user, predict the purchase probability of the user in a future preset time period and diagnose the marketing effect of the host factory and the dealer. The technical problem to be solved is as follows: firstly, automatically acquiring, processing and cleaning data; secondly, finding out user purchase influence factors, analyzing the importance and relevance of the factors, and constructing an evaluation algorithm; thirdly, predicting the purchasing probability of the user, performing key marketing and reversely proposing the delimitation quality problems of the host factory and the dealer. The method comprises the steps of selecting a plurality of perception indexes based on four categories of basic information, machine data, financing information and working hour information; carrying out data cleaning by methods such as univariate analysis, feature selection, correlation matrix analysis, PCA (principal component analysis) and the like; performing data mining by using redundant algorithms such as Logistic regression, C5.0, SVM, QUEST, Naive Bayes and the like, solidifying a Logistic regression model by combining service experience, and mining key indexes and importance; the model is applied to the fields of purchase possibility judgment, image diagnosis and quality difference defining repair.
The invention realizes the centralized, rapid and efficient cleaning of big data such as basic information, machine data and the like, realizes the batch, centralized and accurate marketing and problem diagnosis of a host factory and a distributor by establishing a Logistic regression model, can effectively reduce the sales cost, improve the sales success rate, keep and gradually improve the loyalty of users, and help the host factory and the distributor to enhance the core competitiveness.
Drawings
Fig. 1 is an overall flowchart of a big data-based engineering machinery marketing strategy pushing method according to an embodiment of the present invention;
FIG. 2 is a flowchart of an embodiment of a big data-based engineering machinery marketing strategy pushing method according to an embodiment of the present disclosure;
FIG. 3 is a flow diagram of an embodiment of data cleansing and variable screening according to an embodiment of the present invention;
FIG. 4 is a pictorial diagram illustrating image diagnosis according to an embodiment of the present invention;
FIG. 5 is a schematic structural diagram of an electronic device according to an embodiment of the invention;
fig. 6 is a schematic diagram of an internal logic of a big data based marketing strategy pushing program of a construction machine according to an embodiment of the present invention.
The implementation, functional features and advantages of the objects of the present invention will be further explained with reference to the accompanying drawings.
Detailed Description
In the following description, for purposes of explanation, numerous specific details are set forth in order to provide a thorough understanding of one or more embodiments. It may be evident, however, that such embodiment(s) may be practiced without these specific details.
Specific embodiments of the present application will be described in detail below with reference to the accompanying drawings.
Example 1
In order to explain the marketing strategy pushing method of the engineering machinery based on big data provided by the invention, fig. 1 shows the flow of the marketing strategy pushing method of the engineering machinery based on big data provided by the invention, and fig. 2 shows the flow of the embodiment of the marketing strategy pushing method of the engineering machinery based on big data according to the embodiment of the invention.
Shown together with fig. 1 and fig. 2, the method for pushing the marketing strategy of the construction machine based on big data provided by the invention comprises the following steps:
s110: acquiring initial data of a loader sample from a preset big data platform, and determining a corresponding data index according to the initial data; wherein the initial data comprises: basic information, machine data, financing information, and labor hour information of a loader user; in the process of determining corresponding data indexes according to the initial data, the four types of initial data are used as basic data sources, and corresponding four types of data indexes are generated aiming at the four types of initial data.
It should be noted that, in the four types of initial data, machine data and labor hour information are collected, returned and analyzed to the big data platform for storage through hardware such as T _ BOX and instrument panel on the loader, and only need to be collected in the big data platform when in use; for two types of initial data, namely basic information and financing information, the initial data needs to be transmitted to a preset big data platform from other platforms (such as a financial system, a contract entry system and the like) and then acquired from the big data platform.
Specifically, in the process of storing various types of initial data into the big data platform:
for two types of data, namely machine data and working hour information, the data of the internet of things related to a controller and an ECU (electronic control unit) are acquired through hardware such as a T _ BOX (T _ BOX), an instrument panel and the like, and then the data of the internet of things are returned in a CAN (controller area network) bus mode. And finally, analyzing the data of the Internet of things in a SPRAK flow processing mode and storing the data of the Internet of things to a preset big data platform. For two kinds of data, namely basic information and financing information, the two kinds of data can be directly collected from other platforms to a preset big data platform.
The process of generating corresponding data indexes according to the initial data of the types is further described below by taking data returned by all loaders of a company to the big data platform in 7 months in 2017 as a reference. Of course, whether the corresponding user of the part of the loader has purchasing behavior between 8 months in 2017 and 7 months in 2018 needs to be observed, and the purchasing behavior is summarized according to the modes of months, years, quantiles and the like on the basis, so that other data indexes are generated. At this point, a user information wide table with 80 data indicators (the user information wide table may be stored in the large data platform or in the block chain to protect data in the large data platform) may be generated as shown below.
Figure BDA0002828085950000061
Figure BDA0002828085950000071
Figure BDA0002828085950000081
User information broad table
In the above table, the field name column corresponds to each specific type of data index, the field definition column corresponds to the calculation rule of each type of data index, and the index type is the initial data type of the data index (i.e. the type of the initial data used in the index calculation process). In addition, the observation period refers to 8 months-7 months in 2017, and the repurchasing observation period refers to six months before the repurchasing month.
As can be seen from the above table, the initial data basically includes all data related to the loader samples, and based on the index calculation rules included in the above table, almost all data indexes related to the loader samples can be determined, and the related customer purchase rules can be predicted at a later stage through the data indexes.
S120: after the data indexes are determined, data preprocessing needs to be performed on the acquired data indexes to improve the data quality of the data indexes to obtain filtering indexes, and further improve the precision of a subsequently established model.
The data preprocessing is mainly used for data cleaning and variable screening of data indexes, and mainly comprises compliance analysis, univariate analysis, multivariate analysis and feature selection, and fig. 3 shows an embodiment process of the data cleaning and variable screening according to an embodiment of the invention.
As can be seen from fig. 3, there are two main objectives for compliance analysis; firstly, whether the data source of each data index is accurate is analyzed, an accuracy analysis rule engine is formed based on the engineering machinery industry and the data characteristics of a loader machine, and whether the data source of each data index is accurate is analyzed through the accuracy analysis rule engine; if not, updating the data acquisition source or eliminating the related data indexes or the corresponding loader samples, and if not, not processing. Secondly, whether the calculation rules (corresponding field definition lines) of all the data indexes in the user information wide table are in compliance or not is analyzed, a compliance analysis rule engine is formed based on the engineering machinery industry standard and the loader industry standard and the data characteristics, and whether the calculation rules of all the data indexes in the user information wide table are in compliance or not is analyzed through the compliance analysis rule engine; if not, updating the index calculation rule or eliminating the related index and sample, if so, not processing.
The compliance analysis can improve the compliance and the normalization of data from an index source. For example, in the 7 month part of 2017, the number of days for which the loader is sold is negative, but there is time data, and the part of the loader is checked to be a vehicle for testing in the host factory and is not sold yet, so that the part of the loader sample is deleted.
For univariate analysis, namely univariate analysis, is mainly divided into three aspects, namely index deletion analysis and concentration degree analysis on the first aspect; the second aspect refers to univariate outlier analysis; the third aspect normalizes the data.
Specifically, for the index deletion analysis of the first aspect, the present invention adopts the following two processing modes:
firstly, processing excluding input fields aiming at the user information wide table: excluding fields with too many missing values, the default value of the maximum percentage of missing values is 50% (which can be adjusted by the user as required); excluding fields with too many unique categories, the maximum number of categories default 95 (which may be adjusted as desired); excluding classification fields with excessive values in a single category, the maximum percentage in a single category is 95% by default (which can be adjusted as desired).
Second, input data is prepared: outlier, default 3 σ (self-adjustable as required); an outlier-outlier replacement algorithm, which is used for replacing an outlier by default and selecting a deleted value; data standard method, default z-score method, optional max/min method.
It should be noted that, for the missing value replacement algorithm, a mean value may be used to replace the missing value, a mode may be used to replace the missing value, a median may be used to replace the missing value, a fixed value may be used to replace the missing value, and C & RT may be used to replace the missing value. The replacement algorithm is optional and can be determined according to actual requirements and data types.
In general, data indexes that are missing or empty at 50% or more are considered as representative deficiencies to be excluded; if the situation is not met and the missing situation exists, data supplement is carried out according to business experience and configuration rules, and the specific missing value interpolation rule is as follows:
for average missing man-hours (a data indicator), generally because the vehicle is not working in that month, the missing value is replaced with 0; for missing values of the engine speed and GSM signals (two data indexes), arranging according to a time sequence, and replacing the missing values by using front and rear median numbers with numerical values; for the oil temperature (data index), C & RT (Classification and Regression Tree) algorithm is adopted to replace the missing value, the missing oil temperature data can be classified into a left sub-Tree and equal to a left data value if the characteristics of the missing oil temperature data accord with the characteristics of left and right data of the missing oil temperature data, and otherwise, the missing oil temperature data is classified into a right sub-Tree and equal to a right data value.
For the concentration degree analysis of the first aspect, it is mainly required to ensure that the classification of each data index is reasonable, and avoid the lack of representativeness caused by excessive concentration. For example, if the percentage of a certain data index in a certain index type exceeds 40% or the percentage of a certain two index types exceeds 60%, the data index is considered to be too concentrated in predicting the user purchase possibility and user image, and the distinguishing capability is weak, so that the data index needs to be excluded; the maximum category number of each index type is 95 (can be adjusted according to the requirement), and the data indexes exceeding the maximum category number are deleted.
Univariate outlier analysis for the second aspect. Outliers are typically criteria based on experience and the nature of normal distributions, and if univariate outliers (i.e., the amount of data exceeding 2 standard deviations as a percentage of the total index) are more than 10% of the total index, the index is not representative and the index is excluded. Meanwhile, an outlier exceeding 3 times of the standard deviation and an extremum exceeding 5 times of the standard deviation are set, and loader sample data corresponding to the abnormal values of the part need to be directly eliminated.
Specifically, a reasonable range is set for each index according to the business rule, and if the range is exceeded, the index is regarded as an abnormal value. For example, in the σ rule, the data amount exceeding 2 σ is 10% of the sample data amount, which is an abnormal index, and if the absolute distance between the data and the mean value is greater than 3 σ, the value is an abnormal value. The box plot method shows the upper and lower bounds, upper and lower quartiles, median, and mean of the data, and we refer to the values exceeding the upper and lower bounds as outliers, where the upper bound is the upper quartile + k (upper quartile-lower quartile), the lower bound is the lower quartile-k (upper quartile-lower quartile), the outliers are moderate when k is 1.5, and the outliers are high when k is 3.
For the data normalization of the third aspect, the normalization is to process the data according to the columns of the feature matrix, and by the method of finding z-score, the feature values of the sample are converted to the same dimension to form the non-quantity tempering data, and the normalization formula is as follows:
Figure BDA0002828085950000101
a normalization of μ to 0 and σ to 1 is formed (where μ is the mean and σ is the standard deviation) and the outlier is replaced with a 3 σ value.
For multivariate analysis, multivariate analysis is to analyze and screen the interrelation of a plurality of variables, and the invention mainly adopts a correlation analysis matrix and a PCA method to analyze. The method comprises the following specific steps:
for the correlation analysis matrix, if a finally formed model has a plurality of variables which have high correlation coefficients and belong to the same category, the problems of multiple collinearity, overfitting and the like are generated, so that the problems of instability, poor robustness and the like of the model are caused, and at the moment, the correlation analysis matrix can be adopted for data analysis. It should be noted that, for the process data (referring to each data index) in the present invention, the model needs to be re-established in combination with the selection index based on the automatic operation rule of the machine to ensure the stability, accuracy and reliability of the model.
The process of analyzing the correlation analysis matrix will be further explained below by taking the correlation comparison table of the initial indexes of the class of the man-hour information as an example.
Figure BDA0002828085950000102
Figure BDA0002828085950000111
Correlation comparison table of initial indexes of man-hour information class
In the process of analyzing the correlation analysis matrix using the above table, one index is selected because of the same class of man-hour information. The specific analysis process is as follows:
(1) selecting indexes with correlation larger than 0.6 with other indexes in the table, wherein the table is not available, and if the table is available, entering the step (3);
(2) if 1 is not found, selecting indexes with correlation larger than 0.5 with other word tables, wherein the indexes comprise six indexes of month _4, month _5, month _6, month _8, month _9 and month _ 10;
(3) selecting the indexes with the maximum number of indexes with the correlation larger than 0.8 with other indexes, wherein the indexes comprise month _4 (correlation with month _3 is 0.8, and month _5 is 0.8) and month _5 (correlation with month _4 is 0.8, and month _6 is 0.83);
(4) because two indexes are satisfied, the sum of the correlations of the two indexes and other indexes is calculated, the highest index is included in a subsequent preset data mining model, the correlation sum of the month _4 and other indexes is 7.07, and the correlation sum of the month _5 and other indexes is 6.93, so that the invention reserves the month _4 index.
For PCA algorithm (Principal Component Analysis, Principal Component Analysis method): the dimension reduction can be realized, and the algorithm performance and accuracy are improved. The main process comprises the following steps:
calculating the covariance matrix, Z, of the user information broad tablek×m=Wk×n Xn×m=wx;
Adopting a maximum variance method: maximizing the variance of the new coordinate axis, calculating the eigenvalue and eigenvector of the covariance matrix,
Figure BDA0002828085950000112
wherein s.t W I calculation2Cov (x) is a covariance matrix of matrix x; sorting the calculated characteristic values; selecting the eigenvectors corresponding to the first N maximum eigenvalues (the cumulative variance ratio exceeds 80% or the lithograph eigenvalue is greater than 1); and transferring the data to a new space corresponding to the N vectors.
The PCA method is a data set preprocessing technology, is often used before other algorithms, can remove some redundant information and noise of data, enables the data to be simpler and more efficient, and improves the calculation efficiency of other machine learning tasks; meanwhile, the main features can be identified from the data, the number of the principal components needing to be reserved is determined through characteristic value analysis, and other principal components are abandoned, so that the dimension reduction of the data is realized.
For feature selection, on the basis of compliance analysis, univariate and multivariate analysis, the data quality requirement is met from the perspective of single-index data quality. From the model construction point of view, the most representative data index needs to be further selected. The method adopts the feature selection method for selection, the feature selection method identifies the most important field of the predicted specific result and carries out variable importance sorting, and a faster and more effective model can be generated in an auxiliary manner. The invention adopts the Perason method to predict the value of the variable P for variable selection. The method comprises the following specific steps:
in the first step, an assumption is made that μ ═ μ 0: h0 (original hypothesis): μ ═ μ 0, H1 (alternative hypothesis): μ ≠ μ 0
For example, if the user considers that the number of historical purchased vehicles before the observation period has no influence on whether to buy the vehicle again, μ ═ μ 0 is assumed;
secondly, calculating H0 (original hypothesis) by adopting a Perason method, wherein p-value is obtained when mu is equal to mu 0;
for example, this example calculates p-value 0.01;
a third step of setting α (significance level), which is usually 0.05 or 0.01, and in this example, 0.05 is selected as a criterion;
in the case of α being 0.05, (for the original H0 hypothesis) it is stated that 21 of 22 will make errors, the error probability is high, and the user considers that there is very strong evidence to consider the original H0 to be incorrect;
step four, judging: in this example, p-value < α, p ═ 0.01, which indicates that the probability of μ ═ μ 0 is only 1% (the probability of μ ═ μ 0 not being in the 1- α confidence interval is up to 99%), and the probability of μ ≠ μ 0 is up to 99%.
Therefore, the user rejects the original hypothesis, which is considered that μ ≠ μ 0, i.e., the number of historical purchased vehicles before the observation period has a significant impact on whether to repurchase.
S130: and carrying out data mining on the filtering index through a preset data mining model so as to obtain a key index in the filtering index.
It should be noted that the preset data mining model may adopt a Logistic regression model, and the cleaned data (filtering index) is processed by the Logistic regression model at a later stage, so as to mine a key index and rank the importance of the index; meanwhile, based on a data partitioning and balancing method, dividing training data and test data, and balancing positive and negative samples; evaluating the model effect based on the model accuracy, the confusion matrix and the AUC value; a purchase likelihood (probability) prediction is made for the new sample data. The specific process is as follows:
constructing a model:
the model construction process of the invention adopts python as a development language and develops by using an IDEA community version development tool. The Logistic regression algorithm is used for construction, and it should be noted that the Logistic regression model is a generalized linear regression analysis model and is widely applied to classification and regression scenes, and the main application fields include personal (or enterprise) credit assessment, satisfaction prediction, accurate marketing prediction and the like.
Logistic functions are a widely used class of activation functions, having the shape of exponential functions. It is defined as:
equation 1:
Figure BDA0002828085950000131
where f (t) is the Sigmoid function and t is the data fitting equation, which is of the form equation 3.
The function is derivable everywhere within the defined domain, with both derivatives gradually approaching 0, and has the following characteristics: when t approaches negative infinity, f (t) approaches 0; when t approaches positive infinity, f (t) approaches 1; when t is 1/2, f (t) is 0, i.e.:
equation 2:
Figure BDA0002828085950000132
when t goes to infinity f (t) derivative value limit equals 0.
Assuming t is a linear explanatory variable x, t is expressed as follows:
equation 3: t ═ beta01And x is a curve fitting equation by a simple regression method.
The Logistic function can then be written as:
equation 4:
Figure BDA0002828085950000133
thus, the Logistic regression model can be defined as:
equation 5:
Figure BDA0002828085950000134
logistic regression looks for the most appropriate function for β.
Equation 6:
Figure BDA0002828085950000135
if beta is01x>Y is 0 or 1, if beta01x<And y is 0.
The method adopts a backward stepping method and a Sigmoid function to construct a Logistic regression model.
The Logistic regression model construction process specifically comprises the following steps:
importing the data in the user information wide table after preprocessing in the step 120, firstly carrying out sample partitioning, and 70% of training data and 30% of testing data; the samples were then balanced so that the positive (repurchase vehicle users) negative sample (no repurchase vehicle users) reached a ratio of about 1: 1.
Selecting a dependent variable and an independent variable to train a Logistic regression model, specifically, selecting a data index of a user information wide form is no field as the dependent variable, selecting 25 preprocessed data indexes (such as data in the following table) as the independent variables, and training the Logistic regression model;
serial number Index (I) Serial number Index (I) Serial number Index (I)
1 gong_kuang 10 Month_4 19 Gp50
2 sale_days 11 Buy_count 20 Ip50
3 total_term 12 Invoice_price 21 Iip50
4 sum_overdued 13 First_payment 22 Engine_rotate
5 f_5 14 F_total_hour 23 Oil_temperature
6 f_buy_count 15 F_avg_hour 24 Gps_speed
7 f_6hour 16 Starts_times 25 Imitate1
8 P75 17 F_avg_start_hour
9 tp50 18 Month_8hour
Model construction and index importance description:
and (3) constructing a Logistic regression model by adopting a backward stepping method according to a Sigmoid function, wherein the backward stepping method is to select 25 preprocessed indexes to be all included in the model, and then eliminate the indexes which do not meet the entering standard one by one. According to the formula 5, the output form is:
y=0.08523-0.7324*[gong_kuang=1]-0.393*[gong_kuang=2]-0.3481*sale_days+0.1282*total_term-0.1211*sum_overdued+0.9583*f_5+0.4405*f_buy_count-0.3524*f_6hour-0.2973*P75+0.1025*tp50。
indexes related to the output form are all model key indexes; the evaluation of the model effect based on the likelihood ratio test method will be described below.
A likelihood ratio test is a test that uses a likelihood function to detect whether a certain hypothesis (or limit) is valid. The basic idea is as follows: let a random sample consisting of n observations X1, X2, …, Xn be from the population of density functions f (X; θ), where θ is an unknown parameter. The invalid hypothesis to be tested is H0: θ is θ 0, alternative assumption is H1: θ ≠ θ 0, with the test level α. For this reason, the likelihood function is calculated as a ratio between a value at θ ═ θ 0 and a value at θ ═ θ (maximum point) (that is, a maximum value), and is denoted by λ, and it is known that:
the ratio λ between the two likelihood function values is only a function of the sample observed values and does not contain any unknown parameters.
λ is 0 or more and 1 or less because the likelihood function value will not be negative and the denominator of λ is the maximum of the likelihood function and will not be less than the numerator.
The closer to θ 0, the larger λ; conversely, the greater the difference from θ 0, the smaller λ. Therefore, if the significance threshold λ 0 can be determined from a given α, statistical inference can be made according to the following rule:
when the lambda is less than or equal to lambda 0, rejecting H0 and accepting H1; when λ > λ 0, H0 is not rejected,
where P (λ ≦ λ 0) ═ α.
In this example, the check level α is selected to be 0.05 for checking whether the model index is valid. The following table shows the conditions of Logistic regression model Likelihood Ratio Tests (likehood Ratio Tests), the data (namely sig. values, indexes for judging the quality of the Logistic regression model) in the following table are all lower than 0.05, and the 9 selected indexes are very obvious in positive and negative sample distinguishing angles, so that the Logistic regression model has a good effect.
Figure BDA0002828085950000151
After the model is constructed, the model training and testing effects need to be evaluated.
Specifically, the evaluation was performed by model accuracy: as shown in the following table, the overall accuracy of the training data is 71.62%, the overall accuracy of the test data is 74.57%, and the model is stable and has good effect.
Figure BDA0002828085950000152
The model was evaluated by a confusion matrix: a confusion matrix (as shown in the following table) is used to judge the classification capability of the model, wherein each column represents a prediction class, and the total number of each column represents the number of data predicted as the class; each row represents a true attribution category of data, and the total number of data in each row represents the number of data instances for that category.
Figure BDA0002828085950000153
From the training data and the test data, the model has good recall ratio effect. The negative sample (no-duplicate) recall was 76.01% and 76.05%, and the positive sample (duplicate) recall was 67.36% and 67.96%, respectively. For example, the negative sample recall rate of the training data is 1274/(1274+402) 76.01%.
The model was evaluated by AUC values: AUC values (as in the table below) are indicative of the classification ability of the model, and the values are between 0.5 and 1, with higher values giving better classification ability. For example, the AUC values for the training data and the test data were 0.772 and 0.777, respectively, indicating that the model discriminates well.
Index (I) Training data model Test data
AUC value 0.772 0.777
The reason why the Logistic regression model is selected is described below by taking the same lot of data C5.0 model as an example.
Figure BDA0002828085950000161
Model accuracy table of C5.0 model
Index (I) Training model Test data
AUC value 0.979 0.65
AUC value table of C5.0 model
From the above two tables, from the effect of the model, the accuracy rapidly decreases from 94.68% to 70.74% of the training data, and the AUC value also rapidly decreases from 0.979 to 0.65, indicating that the model is unstable and overfitting occurs. From historical experience, the effect is worse when the method is applied to practice. Other SVM, neural network model models will have similar effects.
Meanwhile, from experience of other industries such as banking industry, insurance industry, communication industry and the like, Logistic regression is a very classical and practical method.
In conclusion, the Logistic regression model is selected to perform accurate marketing data mining, and the effect is remarkable.
S140: in order to overcome the inherent defects of the traditional marketing, the abundant big data resources (key indexes) obtained through the steps are needed to be utilized, a data mining algorithm is utilized to carry out deep analysis, the purchasing rule and tendency are predicted, a corresponding marketing strategy is formulated, and the marketing strategy is pushed to the corresponding client.
The data mining model or key indicators output by the data mining model may be applied to user purchase probability prediction, user portrait diagnosis, and user portrait based diagnosis marketing strategy generation and push.
Specifically, for user purchase likelihood prediction:
and substituting the key index data output by the data mining model into the following table, taking the data of 9 months in 2018 as an example, obtaining the probability value of whether to repurchase or not and predicting the probability value (such as the following table) so as to realize the prediction of the purchasing possibility of the user, wherein in the following table, the higher the prediction accuracy probability is, the higher the possibility of representing the repurchase or no repurchase of the user is.
Figure BDA0002828085950000171
In particular, for portrait diagnosis:
the portrait diagnosis is based on the user purchase possibility prediction (including the prediction accuracy probability) and the user perception index (namely 9 indexes of the model formula), and divides the user into four categories of an absolutely satisfactory type, an optimistically friendly type, a satisfactory restoration type and a passive attention type, as shown in fig. 4.
Importing single user purchase possibility prediction data, comprehensively scoring key indexes of the users based on an analytic hierarchy process to obtain perception types of the users, and automatically generating user portrait types. For example, the following table:
vehicle number Possibility of purchase Type of perception Type of user
********0608046 0 Good taste Satisfactory repair type
********0607883 0 Difference (D) Passive care type
********0607866 0 Good taste Satisfactory repair type
********0607835 0 Difference (D) Passive care type
********0607821 1 Good taste Absolutely satisfactory type
********0607799 1 Good taste Absolutely satisfactory type
********0607785 1 Difference (D) Optimistically friendly
********0607771 0 Difference (D) Passive care type
********0607754 0 Difference (D) Passive care type
********0607740 0 Good taste Satisfactory repair type
And generating and pushing a diagnosis marketing strategy based on the user portrait, and pushing the marketing strategy required by the client to the host factory and the client of each dealer. For example, the vehicle with the tail number of 0607785 has high working strength in the near term, provides active maintenance and service for the vehicle in time, improves the working and service quality of the vehicle, and changes the optimistic friendly type into the absolutely satisfactory type; the vehicle with the end number of 0608046 finds that the reason that the purchase probability is low in working intensity, introduces a working heat area, successfully improves the utilization rate of the vehicle, and after more than one year, the customer newly purchases the vehicle to expand the business.
Based on the embodiment, the data quality of the host factory, including the data quality of basic information, machine data and working hour information, is improved, and is an important basis for mining the commercial value of mass data. Predicting the purchase possibility and the change trend of the user; and (3) performing quality difference delimitation on the user image, and analyzing the characteristics of users with different levels and different characteristics so as to adopt a more targeted marketing strategy.
The accurate marketing focuses on key customers, the repurchase rate of the traditional marketing is improved to be more than 70% from less than 30%, and the efficient utilization of marketing resources is promoted. The core competitiveness of a host factory and a dealer is enhanced in an auxiliary mode, the economic benefit is improved, and the sales cost is reduced.
According to the technical scheme, the engineering machinery marketing strategy pushing method based on the big data automatically collects and processes data according to the configuration rules, and performs multi-dimensional rapid cleaning on the data, so that the data is standardized and unified, and the data quality is effectively improved. The key indexes of accurate marketing are mined by a data mining method, and the importance is ranked, so that the method is clear at a glance. The model is applied to accurate marketing, user portrait and poor quality delimitation, and an accurate marketing list and diagnosed problems are pushed to a host factory and various dealers.
In addition, the invention can automatically mine key factors influencing the purchase of engineering equipment by the user, predict the purchase probability of the user in a future preset time period and diagnose the marketing effect of the host factory and the dealer. The technical problem to be solved is as follows: firstly, automatically acquiring, processing and cleaning data; secondly, finding out user purchase influence factors, analyzing the importance and relevance of the factors, and constructing an evaluation algorithm; thirdly, predicting the purchasing probability of the user, performing key marketing and reversely proposing the delimitation quality problems of the host factory and the dealer. The method comprises the steps of selecting a plurality of perception indexes based on four categories of basic information, machine data, financing information and working hour information; carrying out data cleaning by methods such as univariate analysis, feature selection, correlation matrix analysis, PCA (principal component analysis) and the like; performing data mining by using redundant algorithms such as Logistic regression, C5.0, SVM, QUEST, Naive Bayes and the like, solidifying a Logistic regression model by combining service experience, and mining key indexes and importance; the model is applied to the fields of purchase possibility judgment, image diagnosis and quality difference defining repair.
The invention realizes the centralized, rapid and efficient cleaning of big data such as basic information, machine data and the like, realizes the batch, centralized and accurate marketing and problem diagnosis of a host factory and a distributor by establishing a Logistic regression model, can effectively reduce the sales cost, improve the sales success rate, keep and gradually improve the loyalty of users, and help the host factory and the distributor to enhance the core competitiveness.
It should be understood that, the sequence numbers of the steps in the foregoing embodiments do not imply an execution sequence, and the execution sequence of each process should be determined by its function and inherent logic, and should not constitute any limitation to the implementation process of the embodiments of the present application.
Example 2
Corresponding to the method, the application also provides an engineering machinery marketing strategy pushing system based on big data, and the system comprises:
the system comprises an initial index acquisition unit, a data acquisition unit and a data processing unit, wherein the initial index acquisition unit is used for acquiring initial data of a sample based on a preset big data platform and determining a corresponding initial index according to the initial data;
the filtering unit is used for carrying out data cleaning and variable screening on the initial indexes so as to improve the data quality of the initial indexes to obtain filtering indexes;
the key index mining unit is used for performing data mining on the filtering indexes through a preset data mining model so as to obtain key indexes in the filtering indexes;
and the marketing strategy pushing unit is used for setting a corresponding marketing strategy based on the key indexes and pushing the marketing strategy to a corresponding client.
Example 3
The present invention also provides an electronic device 70. Referring to fig. 5, a schematic structural diagram of an electronic device 70 according to a preferred embodiment of the invention is shown.
In the embodiment, the electronic device 70 may be a terminal device having a computing function, such as a server, a smart phone, a tablet computer, a portable computer, or a desktop computer.
The electronic device 70 includes: a processor 71 and a memory 72.
The memory 72 includes at least one type of readable storage medium. At least one type of readable storage medium may be a non-volatile storage medium such as a flash memory, a hard disk, a multimedia card, a card-type memory, and the like. In some embodiments, the readable storage medium may be an internal storage unit of the electronic device 70, such as a hard disk of the electronic device 70. In other embodiments, the readable storage medium may be an external memory of the electronic device 1, such as a plug-in hard disk provided on the electronic device 70, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card), and the like.
In the present embodiment, the readable storage medium of the memory 72 is generally used for storing a big data based engineering machinery marketing strategy pushing program 73 installed on the electronic device 70. The memory 72 may also be used to temporarily store data that has been output or is to be output.
The processor 72 may be a Central Processing Unit (CPU), microprocessor or other data Processing chip in some embodiments, for running program codes stored in the memory 72 or Processing data, such as a big data based engineering machinery marketing strategy pushing program 73.
In some embodiments, the electronic device 70 is a terminal device of a smartphone, tablet, portable computer, or the like. In other embodiments, the electronic device 70 may be a server.
Fig. 5 shows only an electronic device 70 having components 71-73, but it is to be understood that not all of the shown components are required to be implemented, and that more or fewer components may be implemented instead.
Optionally, the electronic device 70 may further include a user interface, which may include an input unit such as a Keyboard (Keyboard), a voice input device such as a microphone (microphone) or other devices with voice recognition function, a voice output device such as a sound box, a headset, etc., and optionally may also include a standard wired interface, a wireless interface.
Optionally, the electronic device 70 may further include a display, which may also be referred to as a display screen or a display unit. In some embodiments, the display device may be an LED display, a liquid crystal display, a touch-sensitive liquid crystal display, an Organic Light-Emitting Diode (OLED) touch device, or the like. The display is used for displaying information processed in the electronic device 70 and for displaying a visualized user interface.
Optionally, the electronic device 70 may further include a touch sensor. The area provided by the touch sensor for the user to perform touch operation is referred to as a touch area. Further, the touch sensor here may be a resistive touch sensor, a capacitive touch sensor, or the like. The touch sensor may include not only a contact type touch sensor but also a proximity type touch sensor. Further, the touch sensor may be a single sensor, or may be a plurality of sensors arranged in an array, for example.
The area of the display of the electronic device 70 may be the same as or different from the area of the touch sensor. Optionally, the display is stacked with the touch sensor to form a touch display screen. The device detects touch operation triggered by a user based on the touch display screen.
Optionally, the electronic device 70 may further include a Radio Frequency (RF) circuit, a sensor, an audio circuit, and the like, which are not described in detail herein.
In the apparatus embodiment shown in fig. 5, the memory 72, which is a kind of computer storage medium, may include therein an operating system, and a big data-based engineering machine marketing strategy pushing program 73; the processor 71 executes the big data-based engineering machine marketing strategy pushing program 73 stored in the memory 72 to realize the following steps:
acquiring initial data of a sample based on a preset big data platform, and determining a corresponding initial index according to the initial data;
carrying out data cleaning and variable screening on the initial indexes to improve the data quality of the initial indexes to obtain filtering indexes;
performing data mining on the filtering indexes through a preset data mining model to obtain key indexes in the filtering indexes;
and setting a corresponding marketing strategy based on the key indexes, and pushing the marketing strategy to a corresponding client.
In this embodiment, fig. 6 is a schematic diagram of the internal logic of the big data based engineering machine marketing strategy pushing program according to an embodiment of the present invention, and as shown in fig. 6, the big data based engineering machine marketing strategy pushing program 73 may be further divided into one or more modules, and the one or more modules are stored in the memory 72 and executed by the processor 71 to implement the present invention. The modules referred to herein are referred to as a series of computer program instruction segments capable of performing specified functions. Referring to fig. 6, a block diagram of a preferred embodiment of the big data based marketing strategy pushing program 73 of the construction machine of fig. 5 is shown. The big data based engineering machine marketing strategy pushing program 73 can be divided into: an initial index acquisition module 74, a filtering module 75, a key index mining module 76, and a marketing strategy pushing module 77. The functions or operational steps performed by the modules 74-77 are similar to those described above and will not be described in detail herein, for example, where:
an initial index obtaining module 74, configured to obtain initial data of a sample based on a preset big data platform, and determine a corresponding initial index according to the initial data;
the filtering module 75 is configured to perform data cleaning and variable screening on the initial index to improve the data quality of the initial index to obtain a filtering index;
a key index mining module 76, configured to perform data mining on the filtering indexes through a preset data mining model to obtain key indexes in the filtering indexes;
and the marketing strategy pushing module 77 is configured to set a corresponding marketing strategy based on the key index, and push the marketing strategy to a corresponding client.
Example 4
The present invention further provides a computer-readable storage medium, in which a big data based engineering machinery marketing strategy pushing program 73 is stored, and when executed by a processor, the big data based engineering machinery marketing strategy pushing program 73 implements the following operations:
acquiring initial data of a sample based on a preset big data platform, and determining a corresponding initial index according to the initial data;
carrying out data cleaning and variable screening on the initial indexes to improve the data quality of the initial indexes to obtain filtering indexes;
performing data mining on the filtering indexes through a preset data mining model to obtain key indexes in the filtering indexes;
and setting a corresponding marketing strategy based on the key indexes, and pushing the marketing strategy to a corresponding client.
The specific implementation of the computer-readable storage medium provided by the present invention is substantially the same as the specific implementation of the engineering machinery marketing strategy pushing method and the electronic device based on big data, and is not repeated here.
It should be noted that the blockchain in the present invention is a novel application mode of computer technologies such as distributed data storage, point-to-point transmission, consensus mechanism, and encryption algorithm. A block chain (Blockchain), which is essentially a decentralized database, is a series of data blocks associated by using a cryptographic method, and each data block contains information of a batch of network transactions, so as to verify the validity (anti-counterfeiting) of the information and generate a next block. The blockchain may include a blockchain underlying platform, a platform product service layer, an application service layer, and the like.
It is further noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, apparatus, article, or method that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, apparatus, article, or method. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other like elements in a process, apparatus, article, or method that includes the element.
The above-mentioned serial numbers of the embodiments of the present invention are merely for description and do not represent the merits of the embodiments. Through the above description of the embodiments, those skilled in the art will clearly understand that the method of the above embodiments can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware, but in many cases, the former is a better implementation manner. Based on such understanding, the technical solutions of the present invention may be embodied in the form of a software product, which is stored in a storage medium (such as ROM/RAM, magnetic disk, optical disk) and includes instructions for enabling a terminal device (such as a mobile phone, a computer, a server, or a network device) to execute the method according to the embodiments of the present invention.
The above description is only a preferred embodiment of the present invention, and not intended to limit the scope of the present invention, and all modifications of equivalent structures and equivalent processes, which are made by using the contents of the present specification and the accompanying drawings, or directly or indirectly applied to other related technical fields, are included in the scope of the present invention.

Claims (10)

1. A big data-based engineering machinery marketing strategy pushing method is applied to an electronic device, and is characterized by comprising the following steps:
acquiring initial data of a sample based on a preset big data platform, and determining a corresponding initial index according to the initial data;
carrying out data cleaning and variable screening on the initial indexes to improve the data quality of the initial indexes to obtain filtering indexes;
performing data mining on the filtering indexes through a preset data mining model to obtain key indexes in the filtering indexes;
and setting a corresponding marketing strategy based on the key indexes, and pushing the marketing strategy to a corresponding client.
2. The big-data-based engineering machinery marketing strategy pushing method according to claim 1, wherein the process of performing data cleaning and variable screening on the initial indexes comprises the following steps:
and processing the initial index through compliance analysis, univariate analysis, multivariate analysis and feature selection in sequence to realize data cleaning and variable screening of the initial index and obtain the filtration index with improved data quality.
3. The big-data-based engineering machinery marketing strategy pushing method according to claim 2, wherein the process of compliance analysis comprises:
analyzing whether the data source of the initial index is accurate or not; if the accuracy is correct, no processing is carried out; if not, updating the data source of the initial index;
analyzing whether the calculation rule of the initial index is in compliance; if the standard is met, no treatment is carried out; and if the initial indexes are not in compliance, updating the calculation rules of the initial indexes or rejecting the initial indexes.
4. The big-data-based engineering machinery marketing strategy pushing method according to claim 2, wherein the process of univariate analysis comprises:
index deletion: the initial indexes with 50% or more of deficiency or null values are excluded, and the initial indexes with deficiency but less than 50% of deficiency are subjected to data supplement by a deficiency value interpolation method;
univariate outlier analysis: judging the percentage of the data quantity exceeding 2 times of the standard deviation in each initial index to the total index quantity;
if the percentage exceeds 10%, the initial indicator is marked as an outlier and rejected.
5. The big-data based marketing strategy pushing method for construction machinery of claim 2, wherein the process of multivariate analysis comprises:
and eliminating the indexes which belong to the same type and have high correlation or easily cause multiple collinearity in the initial indexes through a correlation analysis matrix and a PCA method.
6. The big-data-based engineering machinery marketing strategy pushing method according to claim 1,
the data mining model is a Logistic regression model.
7. The big data-based engineering machinery marketing strategy pushing method according to claim 6, wherein in the process of presetting the Logistic regression model,
and constructing the Logistic regression model by a backward stepping method and a Sigmoid function.
8. A big data-based engineering machinery marketing strategy pushing system is characterized by comprising:
the system comprises an initial index acquisition unit, a data acquisition unit and a data processing unit, wherein the initial index acquisition unit is used for acquiring initial data of a sample based on a preset big data platform and determining a corresponding initial index according to the initial data;
the filtering unit is used for carrying out data cleaning and variable screening on the initial indexes so as to improve the data quality of the initial indexes to obtain filtering indexes;
the key index mining unit is used for performing data mining on the filtering indexes through a preset data mining model so as to obtain key indexes in the filtering indexes;
and the marketing strategy pushing unit is used for setting a corresponding marketing strategy based on the key indexes and pushing the marketing strategy to a corresponding client.
9. An electronic device, comprising: a memory, a processor, and a big data based work machine marketing strategy pushing program stored in the memory and executable on the processor, the big data based work machine marketing strategy pushing program when executed by the processor implementing the steps of:
acquiring initial data of a sample based on a preset big data platform, and determining a corresponding initial index according to the initial data;
carrying out data cleaning and variable screening on the initial indexes to improve the data quality of the initial indexes to obtain filtering indexes;
performing data mining on the filtering indexes through a preset data mining model to obtain key indexes in the filtering indexes;
and setting a corresponding marketing strategy based on the key indexes, and pushing the marketing strategy to a corresponding client.
10. A computer-readable storage medium, wherein a big data-based engineering machinery marketing strategy pushing program is stored in the computer-readable storage medium, and when the big data-based engineering machinery marketing strategy pushing program is executed by a processor, the steps of the big data-based engineering machinery marketing strategy pushing method according to any one of claims 1 to 7 are implemented.
CN202011454527.3A 2020-12-10 2020-12-10 Engineering machinery marketing strategy pushing method, system and device based on big data Pending CN112506907A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011454527.3A CN112506907A (en) 2020-12-10 2020-12-10 Engineering machinery marketing strategy pushing method, system and device based on big data

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011454527.3A CN112506907A (en) 2020-12-10 2020-12-10 Engineering machinery marketing strategy pushing method, system and device based on big data

Publications (1)

Publication Number Publication Date
CN112506907A true CN112506907A (en) 2021-03-16

Family

ID=74973472

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011454527.3A Pending CN112506907A (en) 2020-12-10 2020-12-10 Engineering machinery marketing strategy pushing method, system and device based on big data

Country Status (1)

Country Link
CN (1) CN112506907A (en)

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105933920A (en) * 2016-03-31 2016-09-07 浪潮通信信息系统有限公司 Method and device for predicting user satisfaction
CN106372959A (en) * 2016-08-22 2017-02-01 广州图灵科技有限公司 Internet-based user access behavior digital marketing system and method
US20180107714A1 (en) * 2016-10-17 2018-04-19 Salesforce.Com, Inc. Automated database query tuning
CN108830649A (en) * 2018-06-05 2018-11-16 国网浙江省电力有限公司 Change of title Electricity customers localization method for power marketing
CN109377252A (en) * 2018-08-30 2019-02-22 广州崇业网络科技有限公司 A kind of customer satisfaction prediction technique based on big data frame
CN110555732A (en) * 2019-08-29 2019-12-10 深圳市云积分科技有限公司 Marketing strategy pushing method and device and marketing strategy operation platform
CN111353809A (en) * 2019-12-12 2020-06-30 合肥工业大学 Social consumer goods retail total quarterly accumulated amplification prediction method and system

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105933920A (en) * 2016-03-31 2016-09-07 浪潮通信信息系统有限公司 Method and device for predicting user satisfaction
CN106372959A (en) * 2016-08-22 2017-02-01 广州图灵科技有限公司 Internet-based user access behavior digital marketing system and method
US20180107714A1 (en) * 2016-10-17 2018-04-19 Salesforce.Com, Inc. Automated database query tuning
CN108830649A (en) * 2018-06-05 2018-11-16 国网浙江省电力有限公司 Change of title Electricity customers localization method for power marketing
CN109377252A (en) * 2018-08-30 2019-02-22 广州崇业网络科技有限公司 A kind of customer satisfaction prediction technique based on big data frame
CN110555732A (en) * 2019-08-29 2019-12-10 深圳市云积分科技有限公司 Marketing strategy pushing method and device and marketing strategy operation platform
CN111353809A (en) * 2019-12-12 2020-06-30 合肥工业大学 Social consumer goods retail total quarterly accumulated amplification prediction method and system

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
刘明芝等: "《中医药统计学与软件应用》", 31 August 2006 *

Similar Documents

Publication Publication Date Title
Lei et al. A Decision Support System for market-driven product positioning and design
Buddhakulsomsiri et al. Association rule-generation algorithm for mining automotive warranty data
US7930242B2 (en) Methods and systems for multi-credit reporting agency data modeling
US20150356576A1 (en) Computerized systems, processes, and user interfaces for targeted marketing associated with a population of real-estate assets
WO2013006341A1 (en) Method and system for selection, filtering or presentation of available sales outlets
Omidi et al. The efficacy of predictive methods in financial statement fraud
CN112990386B (en) User value clustering method and device, computer equipment and storage medium
Ueda et al. How macroeconomic variables affect admission and dismissal in the Brazilian electro-electronic sector: A VAR-based model and cluster analysis
CN110781380A (en) Information pushing method and device, computer equipment and storage medium
Ruyu et al. A comparison of credit rating classification models based on spark-evidence from lending-club
CN114997916A (en) Prediction method, system, electronic device and storage medium of potential user
US20230099627A1 (en) Machine learning model for predicting an action
CN114399367A (en) Insurance product recommendation method, device, equipment and storage medium
CN116739722B (en) Financing lease quotation method and system based on risk assessment
CN111461815B (en) Order recognition model generation method, recognition method, system, equipment and medium
CN110458581B (en) Method and device for identifying business turnover abnormality of commercial tenant
Duarte et al. Forecasting financial distress with machine learning–a review
Gleue et al. Decision support for the automotive industry: Forecasting residual values using artificial neural networks
KR102543211B1 (en) Company&#39;s growth potential prediction system using unstructured data
CN112506907A (en) Engineering machinery marketing strategy pushing method, system and device based on big data
CN111008874B (en) Technical trend prediction method, system and storage medium
Łapczyński et al. The number of clusters in hybrid predictive models: does it really matter?
Popovych Application of AI in Credit Scoring Modeling
Chen et al. Quantum Optimized Cost Based Feature Selection and Credit Scoring for Mobile Micro-financing
Naidovich et al. Survival analysis in credit scoring

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20210316

RJ01 Rejection of invention patent application after publication