CN109272165B - Registration probability estimation method and device, storage medium and electronic equipment - Google Patents

Registration probability estimation method and device, storage medium and electronic equipment Download PDF

Info

Publication number
CN109272165B
CN109272165B CN201811156192.XA CN201811156192A CN109272165B CN 109272165 B CN109272165 B CN 109272165B CN 201811156192 A CN201811156192 A CN 201811156192A CN 109272165 B CN109272165 B CN 109272165B
Authority
CN
China
Prior art keywords
user behavior
behavior data
data
user
prediction model
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201811156192.XA
Other languages
Chinese (zh)
Other versions
CN109272165A (en
Inventor
沙韬伟
邓金秋
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Manbang Information Technology Co ltd
Original Assignee
Man Bang Information Consulting Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Man Bang Information Consulting Co ltd filed Critical Man Bang Information Consulting Co ltd
Priority to CN201811156192.XA priority Critical patent/CN109272165B/en
Publication of CN109272165A publication Critical patent/CN109272165A/en
Application granted granted Critical
Publication of CN109272165B publication Critical patent/CN109272165B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/04Forecasting or optimisation specially adapted for administrative or management purposes, e.g. linear programming or "cutting stock problem"
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/34Recording or statistical evaluation of computer activity, e.g. of down time, of input/output operation ; Recording or statistical evaluation of user activity, e.g. usability assessment
    • G06F11/3438Recording or statistical evaluation of computer activity, e.g. of down time, of input/output operation ; Recording or statistical evaluation of user activity, e.g. usability assessment monitoring of user actions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/02Marketing; Price estimation or determination; Fundraising
    • G06Q30/0201Market modelling; Market analysis; Collecting market data
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/02Marketing; Price estimation or determination; Fundraising
    • G06Q30/0201Market modelling; Market analysis; Collecting market data
    • G06Q30/0202Market predictions or forecasting for commercial activities

Abstract

The invention provides a registration probability estimation method, a registration probability estimation device, a storage medium and electronic equipment. The registration probability estimation method comprises the following steps: acquiring first user behavior data according to the user operation log stream; inputting the first user behavior data into a trained first prediction model, and acquiring data of a plurality of hidden layers of the first prediction model as second user behavior data; carrying out cross construction on part of the first user behavior data according to the calculated importance value to obtain third user behavior data; and splicing the second user behavior data and the third user behavior data to obtain fourth user behavior data. The invention uses the technology of combining the recurrent neural network and the traditional characteristic extraction, acquires the behavior data of the user in real time according to the user operation log stream, ensures the high-efficiency result feedback speed, models the user behavior on the premise of having good extension performance of an algorithm framework, and can effectively predict the probability of the user registration, purchase, click and other behaviors.

Description

Registration probability estimation method and device, storage medium and electronic equipment
Technical Field
The invention relates to the field of computers, in particular to a registration probability estimation method and device based on behavior information, a storage medium and electronic equipment.
Background
In information applications, such as content aggregation type APPs like car and goods matching stations, shopping platforms and the like, the preference of a user for a certain type of goods or goods can be estimated according to a large amount of historical user behavior data and based on specific algorithm analysis. Regarding the registration of the APP, the time between the first time and the last time when the user logs in the APP is relatively short, how to compress the model calculation time under each step of operation of the user, and improving the feedback frequency is an important problem to be considered, and the traditional model is relatively mediocre in performance in this respect, so that the preference degree of the user to a certain APP is difficult to predict accurately, and the registration probability of the user to the APP cannot be known.
Disclosure of Invention
In view of the problems in the prior art, an object of the present invention is to provide a method, an apparatus, a storage medium, and an electronic device for estimating a registration probability, so as to effectively predict the probability of a user's actions such as registration, purchase, and click.
According to an aspect of the present invention, there is provided a registration probability estimation method, including:
acquiring first user behavior data according to the user operation log stream;
inputting the first user behavior data into a trained first prediction model, and acquiring data of a plurality of hidden layers of the first prediction model as second user behavior data;
carrying out cross construction on part of the first user behavior data according to the calculated importance value to obtain third user behavior data;
splicing the second user behavior data and the third user behavior data to obtain fourth user behavior data;
inputting the fourth user behavior data into a second predictive model, the output of the second predictive model being a pre-estimated value of the user's enrollment probability,
the step of cross-building part of the first user behavior data according to the calculated importance value to obtain a third user behavior data further comprises:
distinguishing the first user behavior data into first characteristic data and second characteristic data according to the calculated importance value;
performing cross construction on the second characteristic data to form third characteristic data;
the first characteristic data and the third characteristic data constitute the third user behavior data;
the first prediction model is an RNN model, the RNN model comprises an input layer, a plurality of hidden layers and an output layer, and each hidden layer is a GRU unit; the second prediction model is a logistic regression model. .
In an embodiment of the present invention, the user operation log stream includes user basic information, user behavior information, and device information of a user.
In an embodiment of the present invention, the first prediction model and the second prediction model are trained according to sample data, where the sample data includes user behavior data and a user registration status.
In an embodiment of the invention, the importance value of the first user behavior data is calculated by variance estimation to distinguish the first user behavior data into first feature data and second feature data.
In an embodiment of the invention, the importance value of the first user behavior data is calculated by an xgboost algorithm to distinguish the first user behavior data into first characteristic data and second characteristic data.
In an embodiment of the invention, the importance value of the first user behavior data is calculated by cross entropy to distinguish the first user behavior data into first feature data and second feature data.
According to another aspect of the present invention, there is provided a registration probability estimating apparatus, including:
the acquisition module is used for acquiring first user behavior data according to the user operation log stream;
a first prediction model module, configured to input the first user behavior data into a trained first prediction model, and obtain data of multiple hidden layers of the first prediction model as second user behavior data, where the first prediction model is an RNN model, the RNN model includes an input layer, multiple hidden layers, and an output layer, and each hidden layer is a GRU unit;
the data construction module is used for carrying out cross construction on part of the first user behavior data according to the calculated importance value to obtain third user behavior data;
the data processing module is used for splicing the second user behavior data and the third behavior data to obtain fourth user behavior data;
the second prediction model module is used for inputting the fourth user behavior data into a second prediction model, and taking the output of the second prediction model as a predicted value of the registration probability of the user; the second prediction model is a logistic regression model;
the registration probability pre-estimating device is further configured to:
the step of cross-building part of the first user behavior data according to the calculated importance value to obtain a third user behavior data further comprises:
distinguishing the first user behavior data into first characteristic data and second characteristic data according to the calculated importance value;
performing cross construction on the second characteristic data to form third characteristic data;
the first characteristic data and the third characteristic data constitute the third user behavior data.
According to a further aspect of the invention, a storage medium is provided, on which a computer program is stored, which computer program, when being executed by a processor, performs the method as described above.
According to yet another aspect of the present invention, an electronic device is provided. The electronic device includes: a processor; a storage medium having stored thereon a computer program which, when executed by the processor, performs the method as described above.
The registration probability estimation method provided by the invention uses a technology of combining a recurrent neural network with traditional feature extraction, acquires the behavior data of the user in real time according to the user operation log stream, ensures high-efficiency result feedback speed, models the user behavior on the premise of having good extension performance of an algorithm framework, and can effectively predict the probability of the user's behaviors such as registration, purchase, click and the like.
Drawings
Other features, objects and advantages of the present invention will become more apparent upon reading of the following detailed description of non-limiting embodiments thereof, with reference to the accompanying drawings.
Fig. 1 is a flowchart of a registration probability estimation method according to an embodiment of the present invention.
FIG. 2 is a flowchart of a method for estimating registration probability according to another embodiment of the present invention.
Fig. 3 is a schematic structural diagram of a registration probability estimating apparatus according to an embodiment of the invention.
Fig. 4 is a schematic structural diagram of a registration probability estimating apparatus according to another embodiment of the present invention.
Fig. 5 is a schematic structural diagram of a computer-readable storage medium according to an embodiment of the present invention. And
fig. 6 is a schematic structural diagram of an electronic device according to an embodiment of the invention.
Detailed Description
Example embodiments will now be described more fully with reference to the accompanying drawings. Example embodiments may, however, be embodied in many different forms and should not be construed as limited to the examples set forth herein; rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the concept of example embodiments to those skilled in the art. The described features, structures, or characteristics may be combined in any suitable manner in one or more embodiments.
Furthermore, the drawings are merely schematic illustrations of the present disclosure and are not necessarily drawn to scale. The same reference numerals in the drawings denote the same or similar parts, and thus their repetitive description will be omitted. Some of the block diagrams shown in the figures are functional entities and do not necessarily correspond to physically or logically separate entities. These functional entities may be implemented in the form of software, or in one or more hardware modules or integrated circuits, or in different networks and/or processor devices and/or microcontroller devices.
In order to overcome the defects of the prior art, the invention provides a registration probability estimation method, a device, a storage medium and electronic equipment, so as to effectively predict the probability of the actions of registration, purchase, click and the like of a user, wherein the registration probability reflects the preference degree of the user to a certain APP. Fig. 1 is a flowchart of a registration probability estimation method according to an embodiment of the present invention. FIG. 2 is a flowchart of a method for estimating registration probability according to another embodiment of the present invention. Fig. 3 is a schematic structural diagram of a registration probability estimating apparatus according to an embodiment of the invention. Fig. 4 is a schematic structural diagram of a registration probability estimating apparatus according to another embodiment of the present invention. Fig. 5 is a schematic structural diagram of a computer-readable storage medium according to an embodiment of the present invention. Fig. 6 is a schematic structural diagram of an electronic device according to an embodiment of the present invention.
According to an aspect of the present invention, there is provided a registration probability estimation method, as shown in fig. 1, the registration probability estimation method includes:
and S110, acquiring first user behavior data according to the user operation log stream.
Specifically, the user operation log stream records a plurality of original feature data, which are generally summarized from historical user behavior information, user basic information, user device information, and the like, and the first user behavior data is generally obtained by preprocessing the original feature data (which may specifically include a user device type, a user browsing number in seven days, a user frequently logged in, and the like).
S120, inputting the first user behavior data into a trained first prediction model, and acquiring data of a plurality of hidden layers of the first prediction model as second user behavior data.
In particular, the first user behavior data may be directly input into the first predictive model after being preprocessed at this time. In an embodiment of the present invention, the first prediction model is an RNN model, the RNN model includes an input layer, a plurality of hidden layers and an output layer, and each hidden layer is a GRU unit. The RNN model is a recurrent neural network model, and the principle of the RNN model is to add the neural network model into the characteristics of time sequence. And adding a feedback edge to the hidden layer, wherein the input of each hidden layer comprises the characteristics of the current sample and the information brought by the last time sequence. Each GRU cell contains two gates, a reset gate and an update gate. The results of these two gates are passed through a sigmoid function with a range of [0,1 ]. The candidate implicit state uses a reset gate to control the flow of the last implicit state containing past time information. If the reset gate is approximately 0, the last implicit state will be discarded. Thus, the reset gate provides a mechanism to discard past implicit states that are not relevant in the future, i.e., the reset gate decides how much information was forgotten in the past. The hidden state Ht uses the update gate ZT to update the last hidden state Ht-1 and the candidate hidden states. The update gate may control the importance of the implicit state in the past at the current time. If the update gate is always approximately 1, the past implicit state will always be saved by time and passed to the current time. This design can deal with the problem of gradient attenuation in the recurrent neural network and better capture the more distant dependencies in the time series data. The reset gate helps to capture short term dependencies in the time series data. The update gate helps to capture long term dependencies in the time series data. And updating the GRU and LR model parameters of the circulation network offline according to user operation data, user click stream data and the result of whether the user is actually registered, wherein the user operation data and the result are stored in an HDFS offline, and the HDFS is a Hadoop Distributed File System (HDFS) which is designed to be suitable for a distributed file system running on general hardware (comfort hardware).
S130, carrying out cross construction on part of the first user behavior data according to the calculated importance value to obtain third user behavior data.
Since the first user behavior data includes a plurality of kinds of information, it is necessary to distinguish the importance of the plurality of kinds of information. Specifically, the importance values of the types of data in the first user behavior data may be calculated and distinguished through variance estimation, an xgboost algorithm, cross entropy, and the like.
S140, splicing the second user behavior data and the third user behavior data to obtain fourth user behavior data.
Specifically, if the second user behavior data is [1,0,1,0,0], and the third user behavior data is [0,0,0,1,1], the fourth user behavior data is obtained by splicing [1,0,1,0,0] and [0,0,0,1,1 ]: [1,0,1,0,0,0,0,0,1,1]. Of course, the fourth user behavior data may also be calculated from the second user behavior data and the third user behavior data in other manners, which is not limited by the invention.
And S150, inputting the fourth user behavior data into a second prediction model, and taking the output of the second prediction model as a predicted value of the registration probability of the user.
In an embodiment of the invention, the second prediction model is a logistic regression model. The first prediction model and the second prediction model are trained according to sample data, and the sample data comprises user behavior data and a user registration state. The logistic regression model is a common classification model in machine learning, is mainly used for a binary problem, maps a feature space into a possibility, is a qualitative variable {0,1} in the logistic regression model, and is mainly used for researching the probability of certain events.
The registration probability estimation method provided by the invention uses a technology of combining a recurrent neural network with traditional feature extraction, acquires the behavior data of the user in real time according to the user operation log stream, ensures high-efficiency result feedback speed, models the user behavior on the premise of having good extension performance of an algorithm framework, and can effectively predict the probability of the user's behaviors such as registration, purchase, click and the like.
Since the first user behavior data includes a plurality of kinds of information, it is necessary to distinguish the importance of the plurality of kinds of information. FIG. 2 is a flowchart of a method for estimating registration probability according to another embodiment of the present invention. As shown in fig. 2, in another embodiment of the present invention, step S130 further includes:
s1310, dividing the first user behavior data into first characteristic data and second characteristic data according to the calculated importance value.
S1320, the second feature data with the importance value meeting the preset requirement are subjected to cross construction to form third feature data, and meanwhile the first feature data with the importance value not meeting the preset requirement are kept unchanged. For example, there are two types of the second feature data whose importance values meet preset requirements: age (divided into two groups of more than 20 years old and less than 20 years old) and gender (divided into two groups of male and female), 4 groups of third characteristic data can be obtained by the intersection construction of the second characteristic data of the two aforementioned groups, namely age of more than 20 years old and gender of male, age of more than 20 years old and gender of female, age of less than 20 years old and gender of male, and age of less than 20 years old and gender of female.
S1330, forming the third user behavior data by using the first feature data and the third feature data. Thereby avoiding that a large amount of user information cannot be completely acquired.
Further, an importance value of the first user behavior data may be calculated by variance estimation to distinguish the first user behavior data into first feature data and second feature data.
Optionally, an importance value of the first user behavior data is calculated by an xgboost algorithm to distinguish the first user behavior data into first characteristic data and second characteristic data. The xgboost performs second-order Taylor expansion on the loss function, and adds a regular term outside the objective function to obtain an optimal solution as a whole, so as to balance the reduction of the objective function and the complexity of the model and avoid overfitting. The invention realizes the calculation of the importance value of the first user behavior data through an importance value algorithm (import) in the xgboost.
Optionally, an importance value of the first user behavior data is calculated by cross entropy to distinguish the first user behavior data into first feature data and second feature data. In this case, the cross entropy can be used as a loss function in neural networks (machine learning), assuming that there are now two probability distributions p, q in a sample set. Wherein p represents the distribution of the real markers, q is the distribution of the predicted markers of the trained model, and the similarity of p and q can be measured by the cross entropy loss function. Thus, the first user behavior data is subjected to secondary classification by calculating the similarity between the first user behavior data, and the importance value of each first user behavior data is determined to be the maximum or the minimum according to the classification result. The cross entropy as the loss function has the advantage that the problem of the learning rate reduction of the mean square error loss function can be avoided when the gradient is reduced by using the sigmoid function, because the learning rate can be controlled by the output error. Sigmoid function is a common biological Sigmoid function, also called sigmoidal growth curve. In the information science, due to the properties of single increment and single increment of an inverse function, a Sigmoid function is often used as a threshold function of a neural network, and variables are mapped to be between 0 and 1.
According to another aspect of the present invention, a registration probability estimation apparatus is provided, and fig. 3 is a schematic structural diagram of the registration probability estimation apparatus according to an embodiment of the present invention. As shown in fig. 3, the registration probability estimating apparatus 200 includes: the device comprises an acquisition module 201, a first prediction model module 202, a data construction module 203, a data processing module 204 and a second prediction model module 205. The obtaining module 201 is configured to obtain first user behavior data according to a user operation log stream. The first prediction model module 202 is configured to input the first user behavior data into a trained first prediction model, and obtain data of a plurality of hidden layers of the first prediction model as second user behavior data. The data constructing module 203 is configured to cross-construct part of the first user behavior data according to the calculated importance value to obtain third user behavior data. The data processing module 204 is configured to splice the second user behavior data and the third behavior data to obtain fourth user behavior data. The second predictive model module 205 is configured to input the fourth user behavior data into a second predictive model, and output the second predictive model as a predicted value of the registration probability of the user. In this embodiment, the functions of each module in the registration probability estimation apparatus, and the specific steps and principles from the obtaining module 201 to obtain the first user behavior data to the second prediction model module 205 to obtain the predicted value of the registration probability of the user have been described in the above embodiments, and thus are not described again. The invention uses the technology of combining the recurrent neural network and the traditional characteristic extraction, acquires the behavior data of the user in real time according to the user operation log stream, ensures the high-efficiency result feedback speed, models the user behavior on the premise of having good extension performance of an algorithm framework, and can effectively predict the probability of the user's behaviors such as registration, purchase, click and the like.
Fig. 4 is a schematic structural diagram of a registration probability estimating apparatus according to another embodiment of the present invention. As shown in fig. 4, the registration probability estimation apparatus 200 also includes an obtaining module 201, a first prediction model module 202, a data constructing module 203, a data processing module 204, and a second prediction model module 205. In addition, the data constructing module 203 further includes: a distinguishing module 2031, a cross construction module 2032, and a data integration module 2033. The obtaining module 201 is configured to obtain first user behavior data according to a user operation log stream. The first prediction model module 202 is configured to input the first user behavior data into a trained first prediction model, and obtain data of a plurality of hidden layers of the first prediction model as second user behavior data. The data constructing module 203 is configured to cross-construct part of the first user behavior data according to the calculated importance value to obtain third user behavior data. The data processing module 204 is configured to splice the second user behavior data and the third behavior data to obtain fourth user behavior data. The second predictive model module 205 is configured to input the fourth user behavior data into a second predictive model, and output the second predictive model as a predicted value of the registration probability of the user. The distinguishing module is used for distinguishing the first user behavior data into first characteristic data and second characteristic data according to the calculated importance value. The cross construction module is used for cross construction of the second feature data with the importance values meeting preset requirements to form third feature data. The data integration module is used for forming the third user behavior data by the first characteristic data and the third characteristic data. The invention uses the technology of combining the recurrent neural network and the traditional characteristic extraction, acquires the behavior data of the user in real time according to the user operation log stream, ensures the high-efficiency result feedback speed, models the user behavior on the premise of having good extension performance of an algorithm framework, and can effectively predict the probability of the user's behaviors such as registration, purchase, click and the like.
In an exemplary embodiment of the present invention, a computer-readable storage medium is further provided, on which a computer program is stored, which when executed by, for example, a processor, can implement the registration probability estimation method in any of the above embodiments. In some possible embodiments, aspects of the present invention may also be implemented in the form of a program product including program code for causing a terminal device to perform the methods according to various exemplary embodiments of the present invention described in the above registration probability estimation methods of this specification when the program product is run on the terminal device.
Fig. 5 is a schematic structural diagram of a computer-readable storage medium according to an embodiment of the present invention. Fig. 5 depicts a program product 300 for implementing the above-described method according to an embodiment of the invention, which may employ a portable compact disc read-only memory (CD-ROM) and include program code, and may be run on a terminal device, such as a personal computer. However, the program product of the present invention is not limited in this regard and, in the present document, a readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device.
The program product 300 may employ any combination of one or more readable media. The readable medium may be a readable signal medium or a readable storage medium. A readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples (a non-exhaustive list) of the readable storage medium include: an electrical connection having one or more wires, a portable disk, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.
The computer readable storage medium may include a propagated data signal with readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A readable storage medium may also be any readable medium that is not a readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a readable storage medium may be transmitted using any appropriate medium, including but not limited to wireless, wireline, optical fiber cable, RF, etc., or any suitable combination of the foregoing.
Program code for carrying out operations for aspects of the present invention may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, C + + or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computing device, partly on the user's device, as a stand-alone software package, partly on the user's computing device and partly on a remote computing device, or entirely on the remote computing device or server. In the case of a remote computing device, the remote computing device may be connected to the user computing device through any kind of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or may be connected to an external computing device (e.g., through the internet using an internet service provider).
The invention uses the technology of combining the recurrent neural network and the traditional characteristic extraction, acquires the behavior data of the user in real time according to the user operation log stream, ensures the high-efficiency result feedback speed, models the user behavior on the premise of having good extension performance of an algorithm framework, and can effectively predict the probability of the user's behaviors such as registration, purchase, click and the like.
In an exemplary embodiment of the invention, there is also provided an electronic device that may include a processor and a memory for storing executable instructions of the processor. Wherein the processor is configured to execute the registration probability prediction method in any of the above embodiments via execution of the executable instructions.
As will be appreciated by one skilled in the art, aspects of the present invention may be embodied as a system, method or program product. Thus, various aspects of the invention may be embodied in the form of: an entirely hardware embodiment, an entirely software embodiment (including firmware, microcode, etc.) or an embodiment combining hardware and software aspects that may all generally be referred to herein as a "circuit," module "or" system.
An electronic device 400 according to this embodiment of the invention is described below with reference to fig. 6. The electronic device 400 shown in fig. 6 is only an example and should not bring any limitation to the functions and the scope of use of the embodiments of the present invention.
As shown in fig. 6, electronic device 400 is embodied in the form of a general purpose computing device. The components of electronic device 400 may include, but are not limited to: at least one processing unit 410, at least one memory unit 420, a bus 430 that connects the various system components (including the memory unit 420 and the processing unit 410), a display unit 440, and the like.
Wherein the storage unit stores program code executable by the processing unit 410 to cause the processing unit 410 to perform the steps according to various exemplary embodiments of the present invention described in the registration probability estimation method section above in this specification. For example, the processing unit 410 may perform the steps as shown in fig. 1.
The storage unit 420 may include readable media in the form of volatile storage units, such as a random access memory unit (RAM)4201 and/or a cache memory unit 4202, and may further include a read only memory unit (ROM) 4203.
The storage unit 420 may also include a program/utility 4204 having a set (at least one) of program modules 4205, such program modules 4205 including, but not limited to: an operating system, one or more application programs, other program modules, and program data, each of which, or some combination thereof, may comprise an implementation of a network environment.
Bus 430 may be any bus representing one or more of several types of bus structures, including a memory unit bus or memory unit controller, a peripheral bus, an accelerated graphics port, a processing unit, or a local bus using any of a variety of bus architectures.
The electronic device 400 may also communicate with one or more external devices 500 (e.g., keyboard, pointing device, bluetooth device, etc.), with one or more devices that enable a user to interact with the electronic device 400, and/or with any devices (e.g., router, modem, etc.) that enable the electronic device 400 to communicate with one or more other computing devices. Such communication may occur via input/output (I/O) interfaces 450. Also, the electronic device 400 may communicate with one or more networks (e.g., a Local Area Network (LAN), a Wide Area Network (WAN), and/or a public network, such as the internet) via the network adapter 460. The network adapter 460 may communicate with other modules of the electronic device 400 via the bus 430. It should be appreciated that although not shown in the figures, other hardware and/or software modules may be used in conjunction with electronic device 400, including but not limited to: microcode, device drivers, redundant processing units, external disk drive arrays, RAID systems, tape drives, and data backup storage systems, among others.
Through the above description of the embodiments, those skilled in the art will readily understand that the exemplary embodiments described herein may be implemented by software, or by software in combination with necessary hardware. Therefore, the technical solution according to the embodiment of the present invention may be embodied in the form of a software product, which may be stored in a non-volatile storage medium (which may be a CD-ROM, a usb disk, a removable hard disk, etc.) or on a network, and includes several instructions to enable a computing device (which may be a personal computer, a server, or a network device, etc.) to execute the registration probability estimation method according to the embodiment of the present invention.
The invention uses the technology of combining the recurrent neural network and the traditional characteristic extraction, acquires the behavior data of the user in real time according to the user operation log stream, ensures the high-efficiency result feedback speed, models the user behavior on the premise of having good extension performance of an algorithm framework, and can effectively predict the probability of the user's behaviors such as registration, purchase, click and the like.
The foregoing is a more detailed description of the invention in connection with specific preferred embodiments and it is not intended that the invention be limited to these specific details. For those skilled in the art to which the invention pertains, several simple deductions or substitutions can be made without departing from the spirit of the invention, and all shall be considered as belonging to the protection scope of the invention.

Claims (9)

1. A registration probability estimation method is characterized by comprising the following steps:
acquiring first user behavior data according to the user operation log stream;
inputting the first user behavior data into a trained first prediction model, and acquiring data of a plurality of hidden layers of the first prediction model as second user behavior data;
carrying out cross construction on part of the first user behavior data according to the calculated importance value to obtain third user behavior data;
splicing the second user behavior data and the third user behavior data to obtain fourth user behavior data;
inputting the fourth user behavior data into a second prediction model, and taking the output of the second prediction model as a pre-estimated value of the registration probability of the user;
the step of cross-building part of the first user behavior data according to the calculated importance value to obtain a third user behavior data further comprises:
distinguishing the first user behavior data into first characteristic data and second characteristic data according to the calculated importance value;
performing cross construction on the second characteristic data to form third characteristic data;
the first characteristic data and the third characteristic data constitute the third user behavior data;
the first prediction model is an RNN model, the RNN model comprises an input layer, a plurality of hidden layers and an output layer, and each hidden layer is a GRU unit; the second prediction model is a logistic regression model.
2. The method of claim 1, wherein the user operation log stream comprises user basic information, user behavior information, and device information of a user.
3. The method of claim 1, wherein the first predictive model and the second predictive model are trained according to sample data, the sample data including user behavior data and user registration status.
4. The method of claim 1, wherein the importance value of the first user behavior data is calculated by variance estimation to distinguish the first user behavior data into a first feature data and a second feature data.
5. The method of claim 1, wherein the importance value of the first user behavior data is calculated by an xgboost algorithm to distinguish the first user behavior data into a first feature data and a second feature data.
6. The registration probability estimation method according to claim 1, wherein the importance value of the first user behavior data is calculated by cross entropy to distinguish the first user behavior data into first feature data and second feature data.
7. A registration probability estimation apparatus, comprising:
the acquisition module is used for acquiring first user behavior data according to the user operation log stream;
a first prediction model module, configured to input the first user behavior data into a trained first prediction model, and obtain data of multiple hidden layers of the first prediction model as second user behavior data, where the first prediction model is an RNN model, the RNN model includes an input layer, multiple hidden layers, and an output layer, and each hidden layer is a GRU unit;
the data construction module is used for carrying out cross construction on part of the first user behavior data according to the calculated importance value to obtain third user behavior data;
the data processing module is used for splicing the second user behavior data and the third behavior data to obtain fourth user behavior data;
the second prediction model module is used for inputting the fourth user behavior data into a second prediction model, and the output of the second prediction model is used as a predicted value of the registration probability of the user; the second prediction model is a logistic regression model;
the registration probability pre-estimating device is further configured to:
the step of cross-building part of the first user behavior data according to the calculated importance value to obtain a third user behavior data further comprises:
distinguishing the first user behavior data into first characteristic data and second characteristic data according to the calculated importance value;
performing cross construction on the second characteristic data to form third characteristic data;
the first characteristic data and the third characteristic data constitute the third user behavior data.
8. A storage medium, characterized in that the storage medium has stored thereon a computer program which, when being executed by a processor, performs the method according to any one of claims 1 to 6.
9. An electronic device, characterized in that the electronic device comprises:
a processor;
storage medium having stored thereon a computer program which, when executed by the processor, performs the method of any of claims 1 to 6.
CN201811156192.XA 2018-09-30 2018-09-30 Registration probability estimation method and device, storage medium and electronic equipment Active CN109272165B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201811156192.XA CN109272165B (en) 2018-09-30 2018-09-30 Registration probability estimation method and device, storage medium and electronic equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201811156192.XA CN109272165B (en) 2018-09-30 2018-09-30 Registration probability estimation method and device, storage medium and electronic equipment

Publications (2)

Publication Number Publication Date
CN109272165A CN109272165A (en) 2019-01-25
CN109272165B true CN109272165B (en) 2021-04-20

Family

ID=65195482

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201811156192.XA Active CN109272165B (en) 2018-09-30 2018-09-30 Registration probability estimation method and device, storage medium and electronic equipment

Country Status (1)

Country Link
CN (1) CN109272165B (en)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110288096B (en) * 2019-06-28 2021-06-08 满帮信息咨询有限公司 Prediction model training method, prediction model training device, prediction model prediction method, prediction model prediction device, electronic equipment and storage medium
CN110674188A (en) * 2019-09-27 2020-01-10 支付宝(杭州)信息技术有限公司 Feature extraction method, device and equipment
CN112950353A (en) * 2021-02-08 2021-06-11 北京淇瑀信息科技有限公司 User strategy generation method and device based on 7-day movement support model and electronic equipment

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106407694A (en) * 2016-09-28 2017-02-15 湖南老码信息科技有限责任公司 Neurasthenia prediction method and prediction system based on incremental neural network model
CN107168945A (en) * 2017-04-13 2017-09-15 广东工业大学 A kind of bidirectional circulating neutral net fine granularity opinion mining method for merging multiple features
CN107180284A (en) * 2017-07-07 2017-09-19 北京航空航天大学 A kind of SPOC student based on learning behavior feature shows weekly Forecasting Methodology and device
CN107330445A (en) * 2017-05-31 2017-11-07 北京京东尚科信息技术有限公司 The Forecasting Methodology and device of user property
CN108090607A (en) * 2017-12-13 2018-05-29 中山大学 A kind of social media user's ascribed characteristics of population Forecasting Methodology based on the fusion of multi-model storehouse
CN108121795A (en) * 2017-12-20 2018-06-05 北京奇虎科技有限公司 User's behavior prediction method and device
CN108256757A (en) * 2018-01-10 2018-07-06 链家网(北京)科技有限公司 A kind of source of houses conclusion of the business predictor method based on xgboost and estimate platform

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7593906B2 (en) * 2006-07-31 2009-09-22 Microsoft Corporation Bayesian probability accuracy improvements for web traffic predictions
CN106503805B (en) * 2016-11-14 2019-01-29 合肥工业大学 A kind of bimodal based on machine learning everybody talk with sentiment analysis method
CN107153887A (en) * 2017-04-14 2017-09-12 华南理工大学 A kind of mobile subscriber's behavior prediction method based on convolutional neural networks
CN107222787A (en) * 2017-06-02 2017-09-29 中国科学技术大学 Video resource popularity prediction method

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106407694A (en) * 2016-09-28 2017-02-15 湖南老码信息科技有限责任公司 Neurasthenia prediction method and prediction system based on incremental neural network model
CN107168945A (en) * 2017-04-13 2017-09-15 广东工业大学 A kind of bidirectional circulating neutral net fine granularity opinion mining method for merging multiple features
CN107330445A (en) * 2017-05-31 2017-11-07 北京京东尚科信息技术有限公司 The Forecasting Methodology and device of user property
CN107180284A (en) * 2017-07-07 2017-09-19 北京航空航天大学 A kind of SPOC student based on learning behavior feature shows weekly Forecasting Methodology and device
CN108090607A (en) * 2017-12-13 2018-05-29 中山大学 A kind of social media user's ascribed characteristics of population Forecasting Methodology based on the fusion of multi-model storehouse
CN108121795A (en) * 2017-12-20 2018-06-05 北京奇虎科技有限公司 User's behavior prediction method and device
CN108256757A (en) * 2018-01-10 2018-07-06 链家网(北京)科技有限公司 A kind of source of houses conclusion of the business predictor method based on xgboost and estimate platform

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
利用数据挖掘技术提高电力客户的满意度;谢敏敏;《电力讯息》;20150125(第2期);全文 *
基于兴趣偏好的微博用户性别推断研究;宋巍,等;《电子学报》;20161031;第44卷(第10期);全文 *

Also Published As

Publication number Publication date
CN109272165A (en) 2019-01-25

Similar Documents

Publication Publication Date Title
US11100399B2 (en) Feature extraction using multi-task learning
US11416772B2 (en) Integrated bottom-up segmentation for semi-supervised image segmentation
CN110674880A (en) Network training method, device, medium and electronic equipment for knowledge distillation
CN111523640B (en) Training method and device for neural network model
CN111143226B (en) Automatic test method and device, computer readable storage medium and electronic equipment
CN109272165B (en) Registration probability estimation method and device, storage medium and electronic equipment
CN109636047B (en) User activity prediction model training method, system, device and storage medium
US11663486B2 (en) Intelligent learning system with noisy label data
US11645500B2 (en) Method and system for enhancing training data and improving performance for neural network models
CN114298050A (en) Model training method, entity relation extraction method, device, medium and equipment
US20220180240A1 (en) Transaction composition graph node embedding
US20220350690A1 (en) Training method and apparatus for fault recognition model, fault recognition method and apparatus, and electronic device
US20210081800A1 (en) Method, device and medium for diagnosing and optimizing data analysis system
CN112883990A (en) Data classification method and device, computer storage medium and electronic equipment
CN113826113A (en) Counting rare training data for artificial intelligence
Herzog et al. Data-driven modeling and prediction of complex spatio-temporal dynamics in excitable media
CN113986674A (en) Method and device for detecting abnormity of time sequence data and electronic equipment
CN116684330A (en) Traffic prediction method, device, equipment and storage medium based on artificial intelligence
CN111898675A (en) Credit wind control model generation method and device, scoring card generation method, machine readable medium and equipment
US20210149793A1 (en) Weighted code coverage
US20170154279A1 (en) Characterizing subpopulations by exposure response
US20230206114A1 (en) Fair selective classification via a variational mutual information upper bound for imposing sufficiency
US20230229944A1 (en) Auto-enriching climate-aware supply chain management
US20220309292A1 (en) Growing labels from semi-supervised learning
CN111949867A (en) Cross-APP user behavior analysis model training method, analysis method and related equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
TA01 Transfer of patent application right
TA01 Transfer of patent application right

Effective date of registration: 20210401

Address after: No.123, Kaifa Avenue, Guiyang Economic and Technological Development Zone, 550000, Guizhou Province

Applicant after: Man Bang Information Consulting Co.,Ltd.

Address before: 210012 3-5 / F, building 4, 170-1 software Avenue, Yuhuatai District, Nanjing City, Jiangsu Province

Applicant before: JIANGSU MANYUN SOFTWARE TECHNOLOGY Co.,Ltd.

GR01 Patent grant
GR01 Patent grant
CP01 Change in the name or title of a patent holder
CP01 Change in the name or title of a patent holder

Address after: No.123, Kaifa Avenue, Guiyang Economic and Technological Development Zone, 550000, Guizhou Province

Patentee after: Manbang Information Technology Co.,Ltd.

Address before: No.123, Kaifa Avenue, Guiyang Economic and Technological Development Zone, 550000, Guizhou Province

Patentee before: Man Bang Information Consulting Co.,Ltd.