CN113138977A - Transaction conversion analysis method, device, equipment and storage medium - Google Patents

Transaction conversion analysis method, device, equipment and storage medium Download PDF

Info

Publication number
CN113138977A
CN113138977A CN202110433361.5A CN202110433361A CN113138977A CN 113138977 A CN113138977 A CN 113138977A CN 202110433361 A CN202110433361 A CN 202110433361A CN 113138977 A CN113138977 A CN 113138977A
Authority
CN
China
Prior art keywords
transaction
data
predicted
samples
user
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110433361.5A
Other languages
Chinese (zh)
Inventor
郭轶博
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Kangjian Information Technology Shenzhen Co Ltd
Original Assignee
Kangjian Information Technology Shenzhen Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Kangjian Information Technology Shenzhen Co Ltd filed Critical Kangjian Information Technology Shenzhen Co Ltd
Priority to CN202110433361.5A priority Critical patent/CN113138977A/en
Publication of CN113138977A publication Critical patent/CN113138977A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/21Design, administration or maintenance of databases
    • G06F16/215Improving data quality; Data cleansing, e.g. de-duplication, removing invalid entries or correcting typographical errors
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2411Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on the proximity to a decision surface, e.g. support vector machines
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/044Recurrent networks, e.g. Hopfield networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q40/00Finance; Insurance; Tax strategies; Processing of corporate or income taxes
    • G06Q40/04Trading; Exchange, e.g. stocks, commodities, derivatives or currency exchange

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Engineering & Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Software Systems (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Computational Linguistics (AREA)
  • Biophysics (AREA)
  • Mathematical Physics (AREA)
  • Biomedical Technology (AREA)
  • Health & Medical Sciences (AREA)
  • Business, Economics & Management (AREA)
  • Finance (AREA)
  • Accounting & Taxation (AREA)
  • Databases & Information Systems (AREA)
  • Quality & Reliability (AREA)
  • Evolutionary Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Development Economics (AREA)
  • Economics (AREA)
  • Marketing (AREA)
  • Strategic Management (AREA)
  • Technology Law (AREA)
  • General Business, Economics & Management (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention relates to the field of artificial intelligence, and discloses a transaction conversion analysis method, a device, equipment and a storage medium, wherein the method comprises the following steps: acquiring a historical user data set acquired by a client buried point; according to whether a historical user has a transaction behavior within a preset time, positive and negative sample division is carried out on a historical user data set to obtain a positive sample set and a negative sample set; performing data cleaning treatment on the positive sample set and the negative sample set to be used as model input data; training according to the model input data and a preset neural network to obtain a transaction prediction model; inputting user behavior data of the data to be predicted in the user set to be predicted into a transaction prediction model to obtain a predicted transaction result; and calculating the transaction conversion rate of the user set to be predicted according to the predicted transaction result. The method can improve the efficiency and accuracy of the transaction conversion prediction of the new registered user, and in addition, the invention also relates to a block chain technology, and a historical user data set can be stored in the block chain.

Description

Transaction conversion analysis method, device, equipment and storage medium
Technical Field
The invention relates to the field of artificial intelligence, in particular to a transaction conversion analysis method, a device, equipment and a storage medium.
Background
With the rapid development of the mobile internet, the user stays on various APPs for a longer time, thereby generating a huge amount of user log data. The data records the behavior information of the user, including login behavior, browsing behavior, medical data for inquiry, search data, shopping transaction data, reading article data and the like, and the analysis of the behavior data of the user can have important influence on enterprise decision. For some internet companies, it is a real need to predict the transaction conversion rate of a new registered user. In the era of population bonus ending, the acquisition of flow is more and more difficult, and the cost for acquiring new customers is more and more high, on the premise, how to guide a newly registered user to quickly generate transactions, predict transaction conversion conditions and mine factor ranking influencing transaction conversion behaviors is an important requirement with theoretical significance and practical significance, for example, in the fields of science and technology finance or e-commerce, strategy adjustment is carried out by predicting potential customers of financial products or commodities, and the product conversion rate is improved.
The traditional prediction of transaction transformation of a new registered user mainly utilizes an expert experience method from top to bottom to explore a large variety of influence factors from a macroscopic perspective and qualitatively analyze the characteristics of the transacted user, but the expert experience method is limited by the professional level and authority of experts, the psychological state of the experts, the interest of the experts in the influence factors and the like, and can influence the accuracy of a conclusion, so that the accuracy of a result is low, and meanwhile, the efficiency is low due to the complex flow and long time consumption.
Disclosure of Invention
The invention mainly aims to solve the technical problems of low accuracy and efficiency of the existing transaction conversion prediction for users.
The invention provides a transaction conversion analysis method in a first aspect, which comprises the following steps:
acquiring historical user data of a historical user from a transaction database based on a preset buried point in a transaction client to form a historical user data set;
identifying the transaction behavior of each sample in the historical user data set within a preset time, and performing positive and negative sample division on the historical user data set based on the transaction behavior to obtain a positive sample set and a negative sample set, wherein the transaction behavior comprises occurrence of transactions and non-occurrence of transactions, the positive sample set is a set of all samples in which transactions occur within the preset time, and the negative sample set is a set of all samples in which transactions do not occur within the preset time;
respectively carrying out data cleaning processing on the positive sample set and the negative sample set, and taking the positive sample set and the negative sample set after data cleaning as model input data;
inputting the model input data into a preset neural network for model training to obtain a transaction prediction model;
receiving user behavior data of users to be predicted in a user set to be predicted, and inputting the user behavior data into the transaction prediction model to obtain a predicted transaction result of the users to be predicted;
and calculating the transaction conversion rate of the user set to be predicted according to the predicted transaction result.
Optionally, in a first implementation manner of the first aspect of the present invention, before the performing data cleaning processing on the positive sample set and the negative sample set respectively, and taking the positive sample set and the negative sample set after data cleaning as model input data, the method further includes:
grouping the historical user data sets according to a preset time interval to obtain at least one group of historical user data sets;
and calculating the cosine similarity of included angles among the samples in the historical user data group.
Optionally, in a second implementation manner of the first aspect of the present invention, the performing data cleaning processing on the positive sample set and the negative sample set respectively, and taking the positive sample set and the negative sample set after data cleaning as model input data includes:
determining a first sample in the historical user data group, and screening n samples with the largest cosine similarity of included angles between the first sample and other samples in the historical user data group, wherein n is a natural number not less than three;
when the first sample is a positive sample, determining the number of negative samples in the n samples;
if the number of the negative samples in the n samples is larger than n/2, deleting the negative samples in the n samples from the historical user data group;
when the first sample is a negative sample, then determining a number of positive samples in the n samples;
if the number of positive samples in the n samples is greater than n/2, deleting the first sample from the historical user data set;
repeating the data cleaning process according to the cosine similarity of included angles among other samples until all samples in the historical user data set are subjected to data cleaning;
and taking the residual sample after data cleaning as model input data.
Optionally, in a third implementation manner of the first aspect of the present invention, the neural network is composed of a convolutional neural network and a long-short memory artificial neural network, and the inputting the model input data into a preset neural network for model training to obtain the transaction prediction model includes:
inputting the model training data into an embedded layer of the neural network to generate a feature vector of the model training data;
extracting a feature sequence of the feature vector through a convolutional neural network in the neural network;
inputting the characteristic sequence into a long-term and short-term memory artificial neural network in the neural network, acquiring a historical time sequence of the characteristic sequence, and inputting the characteristic sequence into a full-connection layer of the neural network to obtain a two-dimensional prediction result;
calculating a loss function of the neural network according to the two-dimensional prediction result, adopting a gradient descent method to perform loop iteration to enable the loss function to be converged, and reversely propagating and updating network parameters of the neural network;
and adjusting the neural network based on the network parameters to obtain a transaction prediction model.
Optionally, in a fourth implementation manner of the first aspect of the present invention, the inputting the model training data into an embedding layer of the neural network, and the generating the feature vector of the model training data includes:
converting each character in the model training data into a one-hot code vector;
and converting the one-hot code vector of the model training data into a low-dimensional dense feature vector through a pre-trained vector matrix.
Optionally, in a fifth implementation manner of the first aspect of the present invention, before the collecting, based on a buried point preset in the transaction client, historical user data of a historical user from a transaction database to form a historical user data set, the method further includes:
defining the content of a buried point, and burying the point on the transaction client according to the content of the buried point;
when a user operates the transaction client to generate buried point data, connection is established with a server, the buried point data is uploaded to the server, the server analyzes the buried point data to obtain a target field, and the target field is sent to a Kafka message queue;
performing topology processing on a target field in the Kafka message queue by adopting a streaming computing framework storm, and storing the target field after the topology processing to a distributed file system HDFS (Hadoop distributed file system) according to a preset time interval;
and storing the target field in the distributed file system HDFS as historical user data into a hive data warehouse tool.
Optionally, in a sixth implementation manner of the first aspect of the present invention, the predicted transaction result includes that a transaction is performed within a preset time and a transaction is not performed within a preset time;
the calculating the transaction conversion rate of the user set to be predicted according to the predicted transaction result comprises the following steps:
acquiring the first number of users to be predicted in the user set to be predicted;
acquiring the second user number of the users to be predicted who perform the transaction within the preset time as the centralized predicted transaction result of the users to be predicted;
and dividing the second user number by the first user number to obtain the transaction conversion rate of the user set to be predicted.
A second aspect of the present invention provides a transaction conversion analysis apparatus, including:
the data acquisition module is used for acquiring historical user data of a historical user from a transaction database based on a preset buried point in a transaction client to form a historical user data set;
the system comprises a sample dividing module, a positive sample set and a negative sample set, wherein the sample dividing module is used for identifying the transaction behavior of each sample in the historical user data set within the preset time and dividing the historical user data set into a positive sample set and a negative sample set based on the transaction behavior to obtain the positive sample set and the negative sample set, the transaction behavior comprises the occurrence of transactions and the non-occurrence of transactions, the positive sample set is a set of all samples in which transactions occur within the preset time, and the negative sample set is a set of all samples in which transactions do not occur within the preset time;
the data cleaning module is used for respectively cleaning the positive sample set and the negative sample set and taking the positive sample set and the negative sample set after data cleaning as model input data;
the model training module is used for inputting the model input data into a preset neural network to train a model so as to obtain a transaction prediction model;
the data input module is used for receiving user behavior data of users to be predicted in the user set to be predicted, inputting the user behavior data into the transaction prediction model and obtaining a predicted transaction result of the users to be predicted;
and the conversion rate calculation module is used for calculating the transaction conversion rate of the user set to be predicted according to the predicted transaction result.
Optionally, in a first implementation manner of the second aspect of the present invention, the transaction conversion analysis apparatus further includes a similarity calculation module, where the similarity calculation module is specifically configured to:
grouping the historical user data sets according to a preset time interval to obtain at least one group of historical user data sets;
and calculating the cosine similarity of included angles among the samples in the historical user data group.
Optionally, in a second implementation manner of the second aspect of the present invention, the data cleansing module is specifically configured to:
determining a first sample in the historical user data group, and screening n samples with the largest cosine similarity of included angles between the first sample and other samples in the historical user data group, wherein n is a natural number not less than three;
when the first sample is a positive sample, determining the number of negative samples in the n samples;
if the number of the negative samples in the n samples is larger than n/2, deleting the negative samples in the n samples from the historical user data group;
when the first sample is a negative sample, then determining a number of positive samples in the n samples;
if the number of positive samples in the n samples is greater than n/2, deleting the first sample from the historical user data set;
repeating the data cleaning process according to the cosine similarity of included angles among other samples until all samples in the historical user data set are subjected to data cleaning;
and taking the residual sample after data cleaning as model input data.
Optionally, in a third implementation manner of the second aspect of the present invention, the neural network is composed of a convolutional neural network and a long-short memory artificial neural network, and the model training module is specifically configured to:
inputting the model training data into an embedded layer of the neural network to generate a feature vector of the model training data;
extracting a feature sequence of the feature vector through a convolutional neural network in the neural network;
inputting the characteristic sequence into a long-term and short-term memory artificial neural network in the neural network, acquiring a historical time sequence of the characteristic sequence, and inputting the characteristic sequence into a full-connection layer of the neural network to obtain a two-dimensional prediction result;
calculating a loss function of the neural network according to the two-dimensional prediction result, adopting a gradient descent method to perform loop iteration to enable the loss function to be converged, and reversely propagating and updating network parameters of the neural network;
and adjusting the neural network based on the network parameters to obtain a transaction prediction model.
Optionally, in a fourth implementation manner of the second aspect of the present invention, the model training module is further specifically configured to:
converting each character in the model training data into a one-hot code vector;
and converting the one-hot code vector of the model training data into a low-dimensional dense feature vector through a pre-trained vector matrix.
Optionally, in a fifth implementation manner of the second aspect of the present invention, the transaction conversion analysis apparatus further includes a data storage module, where the data storage module is specifically configured to:
defining the content of a buried point, and burying the point on the transaction client according to the content of the buried point;
when a user operates the transaction client to generate buried point data, connection is established with a server, the buried point data is uploaded to the server, the server analyzes the buried point data to obtain a target field, and the target field is sent to a Kafka message queue;
performing topology processing on a target field in the Kafka message queue by adopting a streaming computing framework storm, and storing the target field after the topology processing to a distributed file system HDFS (Hadoop distributed file system) according to a preset time interval;
and storing the target field in the distributed file system HDFS as historical user data into a hive data warehouse tool.
Optionally, in a sixth implementation manner of the second aspect of the present invention, the predicted transaction result includes that a transaction is performed within a preset time and a transaction is not performed within a preset time; the conversion calculation module is specifically configured to:
acquiring the first number of users to be predicted in the user set to be predicted;
acquiring the second user number of the users to be predicted who perform the transaction within the preset time as the centralized predicted transaction result of the users to be predicted;
and dividing the second user number by the first user number to obtain the transaction conversion rate of the user set to be predicted.
A third aspect of the present invention provides a transaction conversion analysis apparatus comprising: a memory having instructions stored therein and at least one processor, the memory and the at least one processor interconnected by a line; the at least one processor invokes the instructions in the memory to cause the transaction translation analysis device to perform the steps of the transaction translation analysis method described above.
A fourth aspect of the present invention provides a computer-readable storage medium having stored therein instructions, which, when run on a computer, cause the computer to perform the steps of the transaction conversion analysis method described above.
According to the technical scheme, historical user data of a historical user are collected from a transaction database based on a preset buried point in a transaction client, so that a historical user data set is formed; identifying the transaction behavior of each sample in the historical user data set within preset time, and dividing the historical user data set into positive and negative samples based on the transaction behavior to obtain a positive sample set and a negative sample set; respectively carrying out data cleaning processing on the positive sample set and the negative sample set, and taking the positive sample set and the negative sample set after data cleaning as model input data; inputting the model input data into a preset neural network for model training to obtain a transaction prediction model; receiving user behavior data of users to be predicted in a user set to be predicted, and inputting the user behavior data into a transaction prediction model to obtain a predicted transaction result of the users to be predicted; and calculating the transaction conversion rate of the user set to be predicted according to the predicted transaction result. The model of the combined neural network has a good effect on processing time sequence samples observed for a long time, and meanwhile, the automatic feature screening mechanism can effectively reduce the dependence on expert experience in feature engineering and improve the efficiency and accuracy of transaction conversion prediction of new registered users.
Drawings
FIG. 1 is a schematic diagram of a first embodiment of a transaction conversion analysis method according to an embodiment of the invention;
FIG. 2 is a schematic diagram of a second embodiment of a transaction conversion analysis method according to an embodiment of the invention;
FIG. 3 is a schematic diagram of a third embodiment of a transaction conversion analysis method according to an embodiment of the invention;
FIG. 4 is a schematic diagram of a fourth embodiment of a transaction conversion analysis method according to an embodiment of the invention;
FIG. 5 is a schematic diagram of a fifth embodiment of a transaction conversion analysis method according to an embodiment of the present invention;
FIG. 6 is a schematic diagram of an embodiment of a transaction conversion analysis device according to an embodiment of the present invention;
FIG. 7 is a schematic diagram of another embodiment of a transaction conversion analysis device according to an embodiment of the invention;
fig. 8 is a schematic diagram of an embodiment of the transaction conversion analysis device in the embodiment of the present invention.
Detailed Description
According to the technical scheme, historical user data of a historical user are collected from a transaction database based on a preset buried point in a transaction client, so that a historical user data set is formed; identifying the transaction behavior of each sample in the historical user data set within preset time, and dividing the historical user data set into positive and negative samples based on the transaction behavior to obtain a positive sample set and a negative sample set; respectively carrying out data cleaning processing on the positive sample set and the negative sample set, and taking the positive sample set and the negative sample set after data cleaning as model input data; inputting the model input data into a preset neural network for model training to obtain a transaction prediction model; receiving user behavior data of users to be predicted in a user set to be predicted, and inputting the user behavior data into a transaction prediction model to obtain a predicted transaction result of the users to be predicted; and calculating the transaction conversion rate of the user set to be predicted according to the predicted transaction result. The model of the combined neural network has a good effect on processing time sequence samples observed for a long time, and meanwhile, the automatic feature screening mechanism can effectively reduce the dependence on expert experience in feature engineering and improve the efficiency and accuracy of transaction conversion prediction of new registered users.
The terms "first," "second," "third," "fourth," and the like in the description and in the claims, as well as in the drawings, if any, are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It will be appreciated that the data so used may be interchanged under appropriate circumstances such that the embodiments described herein may be practiced otherwise than as specifically illustrated or described herein. Furthermore, the terms "comprises," "comprising," or "having," and any variations thereof, are intended to cover non-exclusive inclusions, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.
For ease of understanding, a detailed flow of an embodiment of the present invention is described below, and referring to fig. 1, a first embodiment of a transaction conversion analysis method according to an embodiment of the present invention includes:
101. acquiring historical user data of a historical user from a transaction database based on a preset buried point in a transaction client to form a historical user data set;
it is to be understood that the executing subject of the present invention may be a transaction conversion analysis device, and may also be a terminal or a server, which is not limited herein. The embodiment of the present invention is described by taking a server as an execution subject.
It is emphasized that the above-described historical user data set may be stored in a node of a blockchain in order to ensure privacy and security of the data.
In the embodiment, user original data are collected in real time through a transaction client terminal buried point, each transaction terminal sends the user original data collected in real time to a server, the server stores the user original data into a database, the user original data of different users stored in the database are used as a historical user data set, wherein the user original data comprise longitude and latitude, country, time zone, network ip, mobile phone brand, APP version, mobile phone model, mobile phone operating system, user login time, track time of all specific set positions of a user on an APP page, carrying parameters and the like, buried point content is defined in advance according to the required data and buried points are carried out, when the user triggers to generate the user original data, the collected user original data are uploaded, processed through big data storm, written into Kafka in batches, and then written into an HDFS in one partition per hour, and finally, importing the data into hive (data warehouse tool), and inserting the data of the previous hour into the data of the previous hour every hour, thereby tracking the behavior track of the user and analyzing the behavior track.
102. Identifying the transaction behavior of each sample in the historical user data set within preset time, and dividing the historical user data set into positive and negative samples based on the transaction behavior to obtain a positive sample set and a negative sample set;
in this embodiment, the transaction behavior includes occurrence of a transaction and non-occurrence of a transaction, the positive sample set is a set of all samples where a transaction occurs within a preset time, and the negative sample set is a set of all samples where a transaction does not occur within a preset time.
In this embodiment, a historical user data set obtained from the data warehouse tool hive is subjected to OLAP (online analytical processing) normalized extraction of a feature creation two-dimensional table, where the extracted features specifically include: registration time, gender, nickname, user type, birthday, province, city, height, weight, age, registration channel, member level, mobile phone brand, app version, device id, last access ip, last 7 days of access days, last 30 days of access days; the number of inquiry days of the last 7 days, the number of inquiry days of the last 30 days, the accumulated inquiry times, the number of inquired departments, the departments with the most inquiry times, the second department with the most inquiry times, the inquiry duration, the inquiry dialogue information number, the inquiry prescription pushing times, the inquiry prescription payment number, the inquiry prescription transaction amount and all the inquiry labels; the time of the last access to the item detail page, the last payment time, the last 7-day purchase amount, the last 30-day purchase amount, the primary category with the most purchasing times, the secondary category with the most purchasing times, the total purchase order number and the total GMV aperture order number in the last 30 days; searching keywords for the first time, the last 7-day searching times and the last 7 days; the health headlines were browsed the last time, complied with the last 7 days, whether there was a transaction activity within 30/60/90 days from the start of the registration date, etc. The two-dimensional table generated is shown in the following table:
Figure BDA0003032227660000071
103. respectively carrying out data cleaning processing on the positive sample set and the negative sample set, and taking the positive sample set and the negative sample set after data cleaning as model input data;
in practical application, most users in the transaction client are users who do not perform transactions within a preset time, most users are majority sets, and few users who perform transactions within the preset time are minority sets, so that the data tilt phenomenon is serious. In essence, the machine learning algorithm is to obtain some experiences from a large data set through calculation, and further determine whether some data are normal or not. However, unbalanced datasets, apparently with too few classes, tend to be more heavily populated by models. Therefore, segmented downsampling is adopted for sample equalization processing, part of samples in the historical user data set are removed, and the remaining positive samples and the remaining negative samples are used as model input data.
In this embodiment, a Tomek Link is mainly used, where the Tomek Link represents a pair of samples with the closest distance between different classes, that is, the two samples are nearest neighbors and belong to different classes. Thus if two samples form a Tomek Link, either one is noise or both samples are near the boundary. Thus, overlapped samples among classes can be cleaned by removing the Tomek Link, so that the samples which are nearest to each other all belong to the same class, and the samples can be better classified.
104. Inputting the model input data into a preset neural network for model training to obtain a transaction prediction model;
in this embodiment, the positive sample and the negative sample after data cleaning are collectively used as model input data, and the neural network structure is a CNN-lstm-full connection layer. The model input uses CNN, the model input data is processed into a numerical value vector, discrete features such as registration time, gender and the like are processed into an independent hot code through one-hot, then a two-dimensional dense non-sparse feature vector is generated through an embedding layer (embedding) of a convolutional neural network, the CNN consists of an input layer for receiving user historical behavior features, an output layer for butting an LSTM input layer and a plurality of hidden layers, the convolutional layers can comprehensively and accurately obtain core influence factors influencing user transaction conversion, the hidden layers perform dimension reduction on user behavior features through a maximum pooling method, transmit the user behavior features into a pooling layer for further dimension reduction and reduce overfitting simultaneously, the LSTM receives a feature sequence extracted by the CNN, and a special forgetting gate, an input gate and an output gate of the feature sequence can change a historical memory state, and update a neural unit of a historical time sequence to keep the long-time continuous existence of user behavior information, thereby predicting behavior over a longer period of time. In practical application, historical observation data of a sample crowd within 30 days are used in practical application, the historical observation data belong to standard time sequence data, LSTM is more suitable for processing the time sequence data than tree models such as XGboost and GBDT, the last layer is a full connection layer and comprises two neurons, and the probability values of two-dimensional prediction results 0 and 1 are correspondingly output.
In this embodiment, the model parameters include bias term coefficients of CNN layer convolution, the number and weight of convolution kernels, and the activation function selects Relu to reduce the disappearance of gradients; the pooling layer needs to be adjusted for the most appropriate step size and pooling size. The LSTM part changes the state of cells by a forgetting gate, an input gate and an output gate, so parameters to be adjusted comprise the input weight of each gate and the cycle weight of each gate, but the parameters are all self-learned by the LSTM according to the output vector of the CNN layer and are initially set to be randomly distributed, and after model training is started, a loss function is converged by using a gradient descent method according to an optimization function, so that the optimal parameters are finally obtained, and a transaction prediction model is obtained.
105. Receiving user behavior data of users to be predicted in a user set to be predicted, and inputting the user behavior data into a transaction prediction model to obtain a predicted transaction result of the users to be predicted;
in the embodiment, users newly registered in a trading client within a period of time are used as users to be predicted, the trading conversion rates of the users to be predicted are finally predicted, user behavior data of the users to be predicted are obtained through a client buried point and are input into a trading prediction model, the output result is a two-dimensional vector, the two dimensions of the vector represent the probability of trading and non-trading, and generally, 0.5 is taken as a threshold value by default so as to be translated into trading and non-trading.
106. And calculating the transaction conversion rate of the user set to be predicted according to the predicted transaction result.
In this embodiment, for example, 100 users to be predicted are included in the set of users to be predicted, the predicted transaction result of each user to be predicted is obtained by inputting a transaction prediction model, and both a party transaction tag and a non-transaction tag are marked for each user to be predicted according to the predicted transaction result, and if 30 users including the party transaction tag in the set of users to be predicted are included, the transaction conversion rate of the set of users to be predicted is 30%. According to the transaction conversion rate. If the transaction conversion rate of the user set to be predicted obtained through calculation is low, the current operation strategy needs to be adjusted.
In the embodiment, a historical user data set of a historical user is acquired by acquiring a transaction client embedded point; according to whether the historical user has a transaction behavior within a preset time, dividing the historical user data set into a positive sample and a negative sample to obtain the positive sample and the negative sample, wherein the positive sample is the sample with the transaction behavior within the preset time, and the negative sample is the sample without the transaction behavior within the preset time; performing data cleaning processing on the positive sample and the negative sample, and taking the residual positive sample and the residual negative sample after data cleaning as model input data; training to obtain a transaction prediction model according to the model input data and a preset neural network; inputting user behavior data of a user to be predicted into the transaction prediction model to obtain a predicted transaction result; and calculating the transaction conversion rate of the user to be predicted according to the predicted transaction result. The model of the combined neural network has a good effect on processing time sequence samples observed for a long time, and meanwhile, the automatic feature screening mechanism can effectively reduce the dependence on expert experience in feature engineering and improve the efficiency and accuracy of transaction conversion prediction of new registered users.
Referring to fig. 2, a second embodiment of the transaction conversion analysis method according to the embodiment of the present invention includes:
201. acquiring historical user data of a historical user from a transaction database based on a preset buried point in a transaction client to form a historical user data set;
202. identifying the transaction behavior of each sample in the historical user data set within preset time, and dividing the historical user data set into positive and negative samples based on the transaction behavior to obtain a positive sample set and a negative sample set;
the steps 201-202 in the present embodiment are similar to the steps 101-102 in the first embodiment, and are not described herein again.
203. Grouping historical user data sets according to a preset time interval to obtain at least one group of historical user data sets;
204. calculating the cosine similarity of included angles among all samples in the historical user data group;
in this embodiment, the samples in the historical user data set are grouped according to a preset time interval because the predicted sample data has a time attenuation characteristic.
205. Determining a first sample in the historical user data group, and screening n samples with the largest cosine similarity of included angles between the first sample and other samples in the historical user data group;
206. when the first sample is a positive sample, determining the number of negative samples in the n samples;
207. if the number of the negative samples in the n samples is larger than n/2, deleting the negative samples in the n samples from the historical user data group;
208. when the first sample is a negative sample, determining the number of positive samples in the n samples;
209. if the number of positive samples in the n samples is larger than n/2, deleting the first sample from the historical user data group;
210. repeating the data cleaning process according to the cosine similarity of included angles among other samples until all samples in the historical user data set are subjected to data cleaning;
211. taking the residual sample after data cleaning as model input data;
in this embodiment, data cleaning is mainly performed by using a Tomek Link, where the Tomek Link represents a pair of samples closest to each other in different classes, that is, the two samples are nearest neighbors to each other and belong to different classes. Thus if two samples form a Tomek Link, either one is noise or both samples are near the boundary. In this way, overlapped samples among classes can be 'cleaned' by removing the Tomek Link, so that the samples which are nearest neighbors belong to the same class, and can be better classified, in the embodiment, the cosine similarity of an included angle before different samples is mainly calculated, whether the two samples are samples which are nearest neighbors is judged according to the cosine similarity of the included angle, a preset number of nearest neighbors samples are found for each sample, in the embodiment, the preset number is 3, the invention does not limit the preset number, the nearest three samples are found for each sample according to the cosine similarity of the included angle, and if the sample is a negative label and two of the nearest three samples are positive labels, the nearest three samples are deleted; on the contrary, when the three samples in the nearest neighbor have two negative labels, the unpurchased users in the nearest neighbor are removed, and all the samples are reserved in the rest cases.
In this embodiment, each sample is a numerical vector, and each dimension of the vector is a numerical representation of the above-mentioned features, where for continuous features, such as the number of inquiry dialogue messages, is a numerical value itself, and for discrete features, such as registration time, the sample is represented as a numerical vector, so that the cosine angle similarity between the two vectors can be conveniently calculated.
212. Inputting the model input data into a preset neural network for model training to obtain a transaction prediction model;
213. receiving user behavior data of users to be predicted in a user set to be predicted, and inputting the user behavior data into a transaction prediction model to obtain a predicted transaction result of the users to be predicted;
214. and calculating the transaction conversion rate of the user set to be predicted according to the predicted transaction result.
The steps 212 and 214 in the present embodiment are similar to the steps 104 and 106 in the first embodiment, and are not described herein again.
The embodiment describes a process of performing data cleaning processing on a positive sample and a negative sample in detail on the basis of the previous embodiment, and using the positive sample and the negative sample left after the data cleaning as model input data, and screening n samples with the largest cosine similarity of an included angle between the n samples and the first sample from other samples in a historical user data group by determining the first sample in the historical user data group, wherein n is a natural number not less than three; when the first sample is a positive sample, determining the number of negative samples in the n samples; if the number of the negative samples in the n samples is larger than n/2, deleting the negative samples in the n samples from the historical user data group; when the first sample is a negative sample, determining the number of positive samples in the n samples; if the number of positive samples in the n samples is larger than n/2, deleting the first sample from the historical user data group; repeating the data cleaning process according to the cosine similarity of included angles among other samples until all samples in the historical user data set are subjected to data cleaning; and taking the residual sample after data cleaning as model input data. By the method, the data of the sample is cleaned, so that the phenomenon of data inclination can be avoided, and the accuracy of a subsequently generated model is higher.
Referring to fig. 3, a third embodiment of the transaction conversion analysis method according to the embodiment of the present invention includes:
301. acquiring historical user data of a historical user from a transaction database based on a preset buried point in a transaction client to form a historical user data set;
302. identifying the transaction behavior of each sample in the historical user data set within preset time, and dividing the historical user data set into positive and negative samples based on the transaction behavior to obtain a positive sample set and a negative sample set;
303. respectively carrying out data cleaning processing on the positive sample set and the negative sample set, and taking the positive sample set and the negative sample set after data cleaning as model input data;
the steps 301-303 in the present embodiment are similar to the steps 101-103 in the first embodiment, and are not described herein again.
304. Converting each character in the model training data into a unique code vector;
305. converting the one-hot code vector of the model training data into a low-dimensional dense feature vector through a vector matrix which is pre-trained;
306. extracting a characteristic sequence of the characteristic vector through a convolutional neural network in the neural network;
307. inputting the characteristic sequence into a long-term and short-term memory artificial neural network in the neural network, acquiring a historical time sequence of the characteristic sequence, and inputting the historical time sequence into a full-connection layer of the neural network to obtain a two-dimensional prediction result;
308. calculating a loss function of the neural network according to the two-dimensional prediction result, adopting a gradient descent method to perform loop iteration to enable the loss function to be converged, and performing back propagation to update network parameters of the neural network;
309. adjusting a neural network based on the network parameters to obtain a transaction prediction model;
in this embodiment, the model input uses CNN (convolutional neural network), the cleaned data is processed into numerical vectors, and one-hot processing is performed, and the vectors are compressed into dense and non-sparse feature vectors by embedding. The CNN is represented by an input layer for receiving the historical behavior characteristics of the user, an output layer for butting an LSTM (long short term memory artificial neural network) input layer and a plurality of hidden layers, the convolutional layers can comprehensively and accurately obtain core influence factors influencing the transaction conversion of the user, and the hidden layers use a maximum pooling method for dimension reduction. The convolution operation can comprehensively and accurately obtain the useful local characteristics of massive user behaviors, the user behavior characteristics are transmitted into the pooling layer after the convolution operation for further dimension reduction, and meanwhile, overfitting can be reduced. The LSTM receives the characteristic sequence extracted by the CNN, and a special forgetting gate, an input gate and an output gate of the characteristic sequence can change the historical memory state and update the neural unit of the historical time sequence to keep the long-time continuous existence of the user behavior information, thereby predicting the behavior after a long time. In practical application, historical observation data of a sample crowd for one month is used, and the time sequence data belongs to standard time sequence data, and is more suitable for processing the time sequence data compared with tree models such as XGboost and GBDT. The last layer is a full-connection layer and comprises two neurons, and the probability value of the two-dimensional prediction result 0 and 1 is correspondingly output. The model parameters comprise bias term coefficients of CNN layer convolution, the number and weight of convolution kernels, and the activation function selects Relu to reduce gradient disappearance; the pooling layer needs to be adjusted for the most appropriate step size and pooling size. The LSTM part changes the state of cells by a forgetting gate, an input gate and an output gate, so parameters to be adjusted comprise the input weight of each gate and the cycle weight of each gate, but the parameters are all LSTM and are independently learned according to CNN layer output vectors, the parameters are initially set to be distributed randomly, and after model training is started, a loss function is finally converged by using a gradient descent method according to an optimization function to obtain the optimal parameters.
310. Receiving user behavior data of users to be predicted in a user set to be predicted, and inputting the user behavior data into a transaction prediction model to obtain a predicted transaction result of the users to be predicted;
311. and calculating the transaction conversion rate of the user set to be predicted according to the predicted transaction result.
Steps 310-311 in the present embodiment are similar to steps 105-106 in the first embodiment, and are not described herein again.
The present embodiment describes in detail a process of training a transaction prediction model based on the foregoing embodiment. Generating a feature vector of the model training data by inputting the model training data into an embedded layer of the neural network; extracting a characteristic sequence of the characteristic vector through a convolutional neural network in the neural network; inputting the characteristic sequence into a long-term and short-term memory artificial neural network in the neural network, acquiring a historical time sequence of the characteristic sequence, and inputting the historical time sequence into a full-connection layer of the neural network to obtain a two-dimensional prediction result; calculating a loss function of the neural network according to the two-dimensional prediction result, adopting a gradient descent method to perform loop iteration to enable the loss function to be converged, and performing back propagation to update network parameters of the neural network; and adjusting the neural network based on the network parameters to obtain a transaction prediction model. The transaction prediction model trained by the method has a good effect on processing time sequence samples observed for a long time, and meanwhile, the automatic feature screening mechanism can effectively reduce the dependence on expert experience in feature engineering and improve the efficiency and accuracy of transaction conversion prediction of new registered users.
Referring to fig. 4, a fourth embodiment of the transaction conversion analysis method according to the embodiment of the present invention includes:
401. defining the content of the embedded point, and embedding the point on the transaction client according to the content of the embedded point;
402. when a user generates buried point data at an operation transaction client, establishing connection with a server, uploading the buried point data to the server, analyzing the buried point data through the server to obtain a target field, and sending the target field to a Kafka message queue;
403. performing topology processing on a target field in a Kafka message queue by adopting a streaming computing framework storm, and storing the target field after the topology processing to a distributed file system HDFS (Hadoop distributed file system) according to a preset time interval;
404. storing a target field in a distributed file system (HDFS) as historical user data into a hive data warehouse tool;
in this embodiment, the content of the embedded point is mainly various events related to the transaction client, such as a page browsing event (page _ evt), a user event (user _ evt), a start event (start _ evt), a quit event (quit _ evt), and a click event (click _ evt), when the content of the embedded point cannot meet the business statistics requirement, the current operation may be defined as a custom event (custom _ evt), and through the event, the longitude and latitude, the country, the time zone, the network ip, the mobile phone brand, the APP version, the mobile phone model, the mobile phone operating system, the user login time, the track time of the user at all specific established positions of the APP page, the information of carrying parameters, and the like are obtained and analyzed as target fields to be sent to a Kafka message queue, and are processed through a big data store, and then the target fields are written into the HDFS one by one hour.
In this embodiment, the real data of the target field is stored in the HDFS, while hive stores neither data nor directly calculates data, the database on hive is only a logical database, and hive does not support objects, so that OLTP (connected object processing) is not supported, and is more suitable for online analysis processing (OLAP), and in a subsequent stage, OLAP normalization extraction is required to be performed to prepare a two-dimensional table, so that the target field in the HDFS needs to be stored in hive.
405. Acquiring historical user data of a historical user from a transaction database based on a preset buried point in a transaction client to form a historical user data set;
406. identifying the transaction behavior of each sample in the historical user data set within preset time, and dividing the historical user data set into positive and negative samples based on the transaction behavior to obtain a positive sample set and a negative sample set;
407. respectively carrying out data cleaning processing on the positive sample set and the negative sample set, and taking the positive sample set and the negative sample set after data cleaning as model input data;
408. inputting the model input data into a preset neural network for model training to obtain a transaction prediction model;
409. receiving user behavior data of users to be predicted in a user set to be predicted, and inputting the user behavior data into a transaction prediction model to obtain a predicted transaction result of the users to be predicted;
410. and calculating the transaction conversion rate of the user set to be predicted according to the predicted transaction result.
Steps 405 and 410 in this embodiment are similar to steps 101 and 106 in the first embodiment, and are not described herein again.
On the basis of the embodiment, the process of data storage is described in detail, and points are buried on the transaction client according to the content of the buried points by defining the content of the buried points;
when a user operates the transaction client to generate buried point data, connection is established with a server, the buried point data is uploaded to the server, the server analyzes the buried point data to obtain a target field, and the target field is sent to a Kafka message queue; performing topology processing on a target field in the Kafka message queue by adopting a streaming computing framework storm, and storing the target field after the topology processing to a distributed file system HDFS (Hadoop distributed file system) according to a preset time interval; and storing the target field in the distributed file system HDFS as historical user data into a hive data warehouse tool. By the method, the behavior of the user in the client can be stored as historical user data, and model training can be conveniently carried out subsequently.
Referring to fig. 5, a fifth embodiment of the transaction conversion analysis method according to the embodiment of the present invention includes:
501. acquiring historical user data of a historical user from a transaction database based on a preset buried point in a transaction client to form a historical user data set;
502. identifying the transaction behavior of each sample in the historical user data set within preset time, and dividing the historical user data set into positive and negative samples based on the transaction behavior to obtain a positive sample set and a negative sample set;
503. respectively carrying out data cleaning processing on the positive sample set and the negative sample set, and taking the positive sample set and the negative sample set after data cleaning as model input data;
504. inputting the model input data into a preset neural network for model training to obtain a transaction prediction model;
505. receiving user behavior data of users to be predicted in a user set to be predicted, and inputting the user behavior data into a transaction prediction model to obtain a predicted transaction result of the users to be predicted;
the steps 501-505 in the present embodiment are similar to the steps 101-105 in the first embodiment, and are not described herein again.
506. Acquiring a first user number of users to be predicted in a user set to be predicted;
507. acquiring a second user number of the users to be predicted, which are subjected to transaction within a preset time and have the transaction results predicted in a centralized mode, of the users to be predicted;
508. and dividing the second user number by the first user number to obtain the transaction conversion rate of the user set to be predicted.
On the basis of the previous embodiment, the process of calculating the transaction conversion rate of the user set to be predicted according to the predicted transaction result is described in detail, and the first user number of the users to be predicted in the user set to be predicted is obtained; acquiring the second user number of the users to be predicted who perform the transaction within the preset time as the centralized predicted transaction result of the users to be predicted; and dividing the second user number by the first user number to obtain the transaction conversion rate of the user set to be predicted. The method has a good effect on processing time sequence samples observed for a long time, and meanwhile, the automatic feature screening mechanism can effectively reduce the dependence on expert experience in feature engineering and improve the efficiency and accuracy of transaction conversion prediction of new registered users.
With reference to fig. 6, the transaction conversion analysis method in the embodiment of the present invention is described above, and a transaction conversion analysis apparatus in the embodiment of the present invention is described below, where an embodiment of the transaction conversion analysis apparatus in the embodiment of the present invention includes:
the data acquisition module 601 is configured to acquire historical user data of a historical user from a transaction database based on a preset buried point in a transaction client, and form a historical user data set;
a sample division module 602, configured to identify transaction behaviors of each sample in the historical user data set within a preset time, and perform positive and negative sample division on the historical user data set based on the transaction behaviors to obtain a positive sample set and a negative sample set, where the transaction behaviors include occurrence of a transaction and non-occurrence of a transaction, the positive sample set is a set of all samples in which a transaction occurs within the preset time, and the negative sample set is a set of all samples in which a transaction does not occur within the preset time;
a data cleaning module 603, configured to perform data cleaning processing on the positive sample set and the negative sample set respectively, and use the positive sample set and the negative sample set after data cleaning as model input data;
the model training module 604 is configured to input the model input data into a preset neural network to perform model training, so as to obtain a transaction prediction model;
the data input module 605 is configured to receive user behavior data of users to be predicted in a set of users to be predicted, and input the user behavior data into the transaction prediction model to obtain a predicted transaction result of the users to be predicted;
and a conversion rate calculation module 606, configured to calculate a transaction conversion rate of the user set to be predicted according to the predicted transaction result.
It is emphasized that the above-described historical user data set may be stored in a node of a blockchain in order to ensure privacy and security of the data.
In an embodiment of the present invention, the transaction conversion analysis apparatus operates the transaction conversion analysis method, and the transaction conversion analysis method includes: acquiring a historical user data set of a historical user acquired by a transaction client embedded point; according to whether the historical user has a transaction behavior within a preset time, dividing the historical user data set into a positive sample and a negative sample to obtain the positive sample and the negative sample, wherein the positive sample is the sample with the transaction behavior within the preset time, and the negative sample is the sample without the transaction behavior within the preset time; performing data cleaning processing on the positive sample and the negative sample, and taking the residual positive sample and the residual negative sample after data cleaning as model input data; training to obtain a transaction prediction model according to the model input data and a preset neural network; inputting user behavior data of a user to be predicted into the transaction prediction model to obtain a predicted transaction result; and calculating the transaction conversion rate of the user to be predicted according to the predicted transaction result. The model of the combined neural network has a good effect on processing time sequence samples observed for a long time, and meanwhile, the automatic feature screening mechanism can effectively reduce the dependence on expert experience in feature engineering and improve the efficiency and accuracy of transaction conversion prediction of new registered users.
Referring to fig. 7, a second embodiment of the transaction conversion analysis apparatus according to the embodiment of the present invention includes:
the data acquisition module 601 is configured to acquire historical user data of a historical user from a transaction database based on a preset buried point in a transaction client, and form a historical user data set;
a sample division module 602, configured to identify transaction behaviors of each sample in the historical user data set within a preset time, and perform positive and negative sample division on the historical user data set based on the transaction behaviors to obtain a positive sample set and a negative sample set, where the transaction behaviors include occurrence of a transaction and non-occurrence of a transaction, the positive sample set is a set of all samples in which a transaction occurs within the preset time, and the negative sample set is a set of all samples in which a transaction does not occur within the preset time;
a data cleaning module 603, configured to perform data cleaning processing on the positive sample set and the negative sample set respectively, and use the positive sample set and the negative sample set after data cleaning as model input data;
the model training module 604 is configured to input the model input data into a preset neural network to perform model training, so as to obtain a transaction prediction model;
the data input module 605 is configured to receive user behavior data of users to be predicted in a set of users to be predicted, and input the user behavior data into the transaction prediction model to obtain a predicted transaction result of the users to be predicted;
and a conversion rate calculation module 606, configured to calculate a transaction conversion rate of the user set to be predicted according to the predicted transaction result.
The transaction conversion analysis device further includes a similarity calculation module 607, and the similarity calculation module 607 is specifically configured to:
grouping the historical user data sets according to a preset time interval to obtain at least one group of historical user data sets;
and calculating the cosine similarity of included angles among the samples in the historical user data group.
Optionally, the data washing module 603 is specifically configured to:
determining a first sample in the historical user data group, and screening n samples with the largest cosine similarity of included angles between the first sample and other samples in the historical user data group, wherein n is a natural number not less than three;
when the first sample is a positive sample, determining the number of negative samples in the n samples;
if the number of the negative samples in the n samples is larger than n/2, deleting the negative samples in the n samples from the historical user data group;
when the first sample is a negative sample, then determining a number of positive samples in the n samples;
if the number of positive samples in the n samples is greater than n/2, deleting the first sample from the historical user data set;
repeating the data cleaning process according to the cosine similarity of included angles among other samples until all samples in the historical user data set are subjected to data cleaning;
and taking the residual sample after data cleaning as model input data.
Optionally, the neural network is composed of a convolutional neural network and a long-short memory artificial neural network, and the model training module 604 is specifically configured to:
inputting the model training data into an embedded layer of the neural network to generate a feature vector of the model training data;
extracting a feature sequence of the feature vector through a convolutional neural network in the neural network;
inputting the characteristic sequence into a long-term and short-term memory artificial neural network in the neural network, acquiring a historical time sequence of the characteristic sequence, and inputting the characteristic sequence into a full-connection layer of the neural network to obtain a two-dimensional prediction result;
calculating a loss function of the neural network according to the two-dimensional prediction result, adopting a gradient descent method to perform loop iteration to enable the loss function to be converged, and reversely propagating and updating network parameters of the neural network;
and adjusting the neural network based on the network parameters to obtain a transaction prediction model.
Optionally, the model training module 604 is further specifically configured to:
converting each character in the model training data into a one-hot code vector;
and converting the one-hot code vector of the model training data into a low-dimensional dense feature vector through a pre-trained vector matrix.
The transaction conversion analysis apparatus further includes a data storage module 608, and the data storage module 608 is specifically configured to:
defining the content of a buried point, and burying the point on the transaction client according to the content of the buried point;
when a user operates the transaction client to generate buried point data, connection is established with a server, the buried point data is uploaded to the server, the server analyzes the buried point data to obtain a target field, and the target field is sent to a Kafka message queue;
performing topology processing on a target field in the Kafka message queue by adopting a streaming computing framework storm, and storing the target field after the topology processing to a distributed file system HDFS (Hadoop distributed file system) according to a preset time interval;
and storing the target field in the distributed file system HDFS as historical user data into a hive data warehouse tool.
Optionally, the predicted transaction result includes that a transaction is performed within a preset time and a transaction is not performed within the preset time; the conversion calculation module 606 is specifically configured to:
acquiring the first number of users to be predicted in the user set to be predicted;
acquiring the second user number of the users to be predicted who perform the transaction within the preset time as the centralized predicted transaction result of the users to be predicted;
and dividing the second user number by the first user number to obtain the transaction conversion rate of the user set to be predicted.
On the basis of the previous embodiment, the specific functions of each module and the unit composition of part of the modules are described in detail, the device can well process time sequence samples observed for a long time, meanwhile, the automatic feature screening mechanism can effectively reduce the dependence on expert experience in feature engineering, and the efficiency and accuracy of transaction conversion prediction of new registered users are improved.
Fig. 6 and 7 describe the transaction conversion analysis device in the embodiment of the present invention in detail from the perspective of the modular functional entity, and the transaction conversion analysis device in the embodiment of the present invention is described in detail from the perspective of hardware processing.
Fig. 8 is a schematic structural diagram of a transaction conversion analysis apparatus according to an embodiment of the present invention, where the transaction conversion analysis apparatus 800 may generate a relatively large difference due to different configurations or performances, and may include one or more processors (CPUs) 810 (e.g., one or more processors) and a memory 820, and one or more storage media 830 (e.g., one or more mass storage devices) storing an application 833 or data 832. Memory 820 and storage medium 830 may be, among other things, transient or persistent storage. The program stored in the storage medium 830 may include one or more modules (not shown), each of which may include a series of instructions operating on the transaction conversion analysis device 800. Further, the processor 810 may be configured to communicate with the storage medium 830, and execute a series of instruction operations in the storage medium 830 on the transaction conversion analysis device 800 to implement the steps of the transaction conversion analysis method described above.
The transaction conversion analysis device 800 may also include one or more power supplies 840, one or more wired or wireless network interfaces 850, one or more input-output interfaces 860, and/or one or more operating systems 831, such as Windows Server, Mac OS X, Unix, Linux, FreeBSD, and the like. Those skilled in the art will appreciate that the configuration of the transaction conversion analysis device illustrated in FIG. 8 does not constitute a limitation of the transaction conversion analysis device provided herein, and may include more or fewer components than those illustrated, or some components may be combined, or a different arrangement of components.
The block chain is a novel application mode of computer technologies such as distributed data storage, point-to-point transmission, a consensus mechanism, an encryption algorithm and the like. A block chain (Blockchain), which is essentially a decentralized database, is a series of data blocks associated by using a cryptographic method, and each data block contains information of a batch of network transactions, so as to verify the validity (anti-counterfeiting) of the information and generate a next block. The blockchain may include a blockchain underlying platform, a platform product service layer, an application service layer, and the like.
The present invention also provides a computer-readable storage medium, which may be a non-volatile computer-readable storage medium, which may also be a volatile computer-readable storage medium, having stored therein instructions, which, when executed on a computer, cause the computer to perform the steps of the transaction conversion analysis method.
It is clear to those skilled in the art that, for convenience and brevity of description, the specific working processes of the above-described systems, apparatuses, and units may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.
The integrated unit, if implemented in the form of a software functional unit and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: various media capable of storing program codes, such as a usb disk, a removable hard disk, a read-only memory (ROM), a Random Access Memory (RAM), a magnetic disk, or an optical disk.
The above-mentioned embodiments are only used for illustrating the technical solutions of the present invention, and not for limiting the same; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present invention.

Claims (10)

1. A transaction conversion analysis method, comprising:
acquiring historical user data of a historical user from a transaction database based on a preset buried point in a transaction client to form a historical user data set;
identifying the transaction behavior of each sample in the historical user data set within a preset time, and performing positive and negative sample division on the historical user data set based on the transaction behavior to obtain a positive sample set and a negative sample set, wherein the transaction behavior comprises occurrence of transactions and non-occurrence of transactions, the positive sample set is a set of all samples in which transactions occur within the preset time, and the negative sample set is a set of all samples in which transactions do not occur within the preset time;
respectively carrying out data cleaning processing on the positive sample set and the negative sample set, and taking the positive sample set and the negative sample set after data cleaning as model input data;
inputting the model input data into a preset neural network for model training to obtain a transaction prediction model;
receiving user behavior data of users to be predicted in a user set to be predicted, and inputting the user behavior data into the transaction prediction model to obtain a predicted transaction result of the users to be predicted;
and calculating the transaction conversion rate of the user set to be predicted according to the predicted transaction result.
2. The transaction conversion analysis method according to claim 1, further comprising, before performing data cleaning processing on the positive sample set and the negative sample set respectively and using the positive sample set and the negative sample set after data cleaning as model input data:
grouping the historical user data sets according to a preset time interval to obtain at least one group of historical user data sets;
and calculating the cosine similarity of included angles among the samples in the historical user data group.
3. The transaction conversion analysis method according to claim 2, wherein the performing data cleaning processing on the positive sample set and the negative sample set respectively, and using the positive sample set and the negative sample set after data cleaning as model input data includes:
determining a first sample in the historical user data group, and screening n samples with the largest cosine similarity of included angles between the first sample and other samples in the historical user data group, wherein n is a natural number not less than three;
when the first sample is a positive sample, determining the number of negative samples in the n samples;
if the number of the negative samples in the n samples is larger than n/2, deleting the negative samples in the n samples from the historical user data group;
when the first sample is a negative sample, then determining a number of positive samples in the n samples;
if the number of positive samples in the n samples is greater than n/2, deleting the first sample from the historical user data set;
repeating the data cleaning process according to the cosine similarity of included angles among other samples until all samples in the historical user data set are subjected to data cleaning;
and taking the residual sample after data cleaning as model input data.
4. The transaction conversion analysis method according to claim 3, wherein the neural network is composed of a convolutional neural network and a long-short memory artificial neural network, and the inputting of the model input data into a preset neural network for model training to obtain the transaction prediction model comprises:
inputting the model training data into an embedded layer of the neural network to generate a feature vector of the model training data;
extracting a feature sequence of the feature vector through a convolutional neural network in the neural network;
inputting the characteristic sequence into a long-term and short-term memory artificial neural network in the neural network, acquiring a historical time sequence of the characteristic sequence, and inputting the characteristic sequence into a full-connection layer of the neural network to obtain a two-dimensional prediction result;
calculating a loss function of the neural network according to the two-dimensional prediction result, adopting a gradient descent method to perform loop iteration to enable the loss function to be converged, and reversely propagating and updating network parameters of the neural network;
and adjusting the neural network based on the network parameters to obtain a transaction prediction model.
5. The transaction conversion analysis method of claim 4, wherein the inputting the model training data into an embedded layer of the neural network, generating feature vectors for the model training data comprises:
converting each character in the model training data into a one-hot code vector;
and converting the one-hot code vector of the model training data into a low-dimensional dense feature vector through a pre-trained vector matrix.
6. The transaction conversion analysis method according to any one of claims 1 to 5, wherein before collecting historical user data of the historical user from the transaction database based on a pre-set buried point in the transaction client to form a historical user data set, the method further comprises:
defining the content of a buried point, and burying the point on the transaction client according to the content of the buried point;
when a user operates the transaction client to generate buried point data, connection is established with a server, the buried point data is uploaded to the server, the server analyzes the buried point data to obtain a target field, and the target field is sent to a Kafka message queue;
performing topology processing on a target field in the Kafka message queue by adopting a streaming computing framework storm, and storing the target field after the topology processing to a distributed file system HDFS (Hadoop distributed file system) according to a preset time interval;
and storing the target field in the distributed file system HDFS as historical user data into a hive data warehouse tool.
7. The transaction conversion analysis method according to claim 6, wherein the predicted transaction result includes that a transaction is performed within a preset time and a transaction is not performed within a preset time;
the calculating the transaction conversion rate of the user set to be predicted according to the predicted transaction result comprises the following steps:
acquiring the first number of users to be predicted in the user set to be predicted;
acquiring the second user number of the users to be predicted who perform the transaction within the preset time as the centralized predicted transaction result of the users to be predicted;
and dividing the second user number by the first user number to obtain the transaction conversion rate of the user set to be predicted.
8. A transaction conversion analysis device, characterized by comprising:
the data acquisition module is used for acquiring historical user data of a historical user from a transaction database based on a preset buried point in a transaction client to form a historical user data set;
the system comprises a sample dividing module, a positive sample set and a negative sample set, wherein the sample dividing module is used for identifying the transaction behavior of each sample in the historical user data set within the preset time and dividing the historical user data set into a positive sample set and a negative sample set based on the transaction behavior to obtain the positive sample set and the negative sample set, the transaction behavior comprises the occurrence of transactions and the non-occurrence of transactions, the positive sample set is a set of all samples in which transactions occur within the preset time, and the negative sample set is a set of all samples in which transactions do not occur within the preset time;
the data cleaning module is used for respectively cleaning the positive sample set and the negative sample set and taking the positive sample set and the negative sample set after data cleaning as model input data;
the model training module is used for inputting the model input data into a preset neural network to train a model so as to obtain a transaction prediction model;
the data input module is used for receiving user behavior data of users to be predicted in the user set to be predicted, inputting the user behavior data into the transaction prediction model and obtaining a predicted transaction result of the users to be predicted;
and the conversion rate calculation module is used for calculating the transaction conversion rate of the user set to be predicted according to the predicted transaction result.
9. A transaction conversion analysis apparatus, characterized in that the transaction conversion analysis apparatus comprises: a memory having instructions stored therein and at least one processor, the memory and the at least one processor interconnected by a line;
the at least one processor invoking the instructions in the memory to cause the transaction translation analysis device to perform the steps of the transaction translation analysis method of any of claims 1-7.
10. A computer-readable storage medium, having a computer program stored thereon, wherein the computer program, when executed by a processor, implements the steps of the transaction conversion analysis method according to any of claims 1-7.
CN202110433361.5A 2021-04-22 2021-04-22 Transaction conversion analysis method, device, equipment and storage medium Pending CN113138977A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110433361.5A CN113138977A (en) 2021-04-22 2021-04-22 Transaction conversion analysis method, device, equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110433361.5A CN113138977A (en) 2021-04-22 2021-04-22 Transaction conversion analysis method, device, equipment and storage medium

Publications (1)

Publication Number Publication Date
CN113138977A true CN113138977A (en) 2021-07-20

Family

ID=76813458

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110433361.5A Pending CN113138977A (en) 2021-04-22 2021-04-22 Transaction conversion analysis method, device, equipment and storage medium

Country Status (1)

Country Link
CN (1) CN113138977A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113965562A (en) * 2021-12-21 2022-01-21 深圳市思迅网络科技有限公司 Realization method for reducing POS foreground download data flow consumption

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111160484A (en) * 2019-12-31 2020-05-15 腾讯科技(深圳)有限公司 Data processing method and device, computer readable storage medium and electronic equipment
CN112085541A (en) * 2020-09-27 2020-12-15 中国建设银行股份有限公司 User demand analysis method and device based on browsing consumption time series data
CN112231584A (en) * 2020-12-08 2021-01-15 平安科技(深圳)有限公司 Data pushing method and device based on small sample transfer learning and computer equipment

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111160484A (en) * 2019-12-31 2020-05-15 腾讯科技(深圳)有限公司 Data processing method and device, computer readable storage medium and electronic equipment
CN112085541A (en) * 2020-09-27 2020-12-15 中国建设银行股份有限公司 User demand analysis method and device based on browsing consumption time series data
CN112231584A (en) * 2020-12-08 2021-01-15 平安科技(深圳)有限公司 Data pushing method and device based on small sample transfer learning and computer equipment

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113965562A (en) * 2021-12-21 2022-01-21 深圳市思迅网络科技有限公司 Realization method for reducing POS foreground download data flow consumption
CN113965562B (en) * 2021-12-21 2022-03-01 深圳市思迅网络科技有限公司 Realization method for reducing POS foreground download data flow consumption

Similar Documents

Publication Publication Date Title
CN108133418A (en) Real-time credit risk management system
CN112785397A (en) Product recommendation method, device and storage medium
US20210264448A1 (en) Privacy preserving ai derived simulated world
CN107689008A (en) A kind of user insures the method and device of behavior prediction
CN111967971B (en) Bank customer data processing method and device
CN112232833A (en) Lost member customer group data prediction method, model training method and model training device
CN107368499B (en) Client label modeling and recommending method and device
CN114612251A (en) Risk assessment method, device, equipment and storage medium
CN114371946B (en) Information push method and information push server based on cloud computing and big data
CN115221396A (en) Information recommendation method and device based on artificial intelligence and electronic equipment
CN111210332A (en) Method and device for generating post-loan management strategy and electronic equipment
CN110069686A (en) User behavior analysis method, apparatus, computer installation and storage medium
CN113138977A (en) Transaction conversion analysis method, device, equipment and storage medium
Nurlybayeva et al. Algorithmic scoring models
CN112200684A (en) Method, system and storage medium for detecting medical insurance fraud
CN116800831A (en) Service data pushing method, device, storage medium and processor
CN113656692B (en) Product recommendation method, device, equipment and medium based on knowledge migration algorithm
Poornima et al. Prediction of water consumption using machine learning algorithm
Ling et al. Financial Crisis Prediction Based on Long‐Term and Short‐Term Memory Neural Network
CN114912538A (en) Information push model training method, information push method, device and equipment
CN111984842B (en) Bank customer data processing method and device
CN114529399A (en) User data processing method, device, computer equipment and storage medium
CN107526794A (en) Data processing method and device
CN113469819A (en) Recommendation method of fund product, related device and computer storage medium
Reddy Particle Swarm Optimized Neural Network for Predicting Customer Behaviour in Digital Marketing

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination