Register probability predictor method, device, storage medium and electronic equipment
Technical field
The present invention relates to the registration probability predictor method of computer field more particularly to a kind of Behavior-based control information, device,
Storage medium and electronic equipment.
Background technique
In the content-aggregated class APP such as info class application, such as vehicle and goods matching platform, shopping platform, according to a large amount of in history
User behavior data and based on the analysis of specific algorithm it is estimated that user for certain a kind of commodity or goods preference.Just
For the registration of APP, user is opposite with APP this period is logged in for the last time of short duration for the first time, how to compress each step of user
Model under operation calculates the time, and improving feedback frequency is major issue in need of consideration, and the table of conventional model in this respect
It is now more mediocre, it is difficult to which that Accurate Prediction user can not learn user to the registration probability of the APP fancy grade of a certain APP.
Summary of the invention
For the problems of the prior art, the purpose of the present invention is to provide a kind of registration probability predictor method, device, deposit
Storage media and electronic equipment, with the probability of the behaviors such as registration, purchase, click that user is effectively predicted.
According to an aspect of the present invention, a kind of registration probability predictor method is provided, the registration probability predictor method includes:
The first user behavior data is obtained according to User operation log stream;
First user behavior data is inputted into housebroken first prediction model, and obtains the first prediction mould
The data of multiple hidden layers of type are as second user behavioral data;
Cross conformation is carried out by importance values calculated at least partly described first user behavior data and obtains third
User behavior data;
Second behavioral data and the third user behavior data are spliced to obtain fourth line as data;
The fourth user behavioral data is inputted into the second prediction model, using the output of second prediction model as use
The discreet value of the registration probability at family.
In one embodiment of the present invention, the User operation log stream includes user basic information, user behavior letter
Breath and the facility information of user.
In one embodiment of the present invention, first prediction model is RNN model, and the RNN model includes one defeated
Enter layer, multiple hidden layers and an output layer, each hidden layer is a GRU unit.
In one embodiment of the present invention, second prediction model is Logic Regression Models.
In one embodiment of the present invention, first prediction model and second prediction model are according to sample data
It is trained, the sample data includes user behavior data and user registration state.
In one embodiment of the present invention, it is described at least partly described first user behavior data by calculated heavy
The step of progress cross conformation of the property wanted value obtains third user behavior data further comprises:
First user behavior data is divided into fisrt feature data and second feature by importance values calculated
Data;
The second feature data are subjected to cross conformation, to form third feature data;
The fisrt feature data and the third feature data constitute the third user behavior data.
In one embodiment of the present invention, the importance values of first user behavior data are calculated by variance evaluation
First user behavior data is divided into fisrt feature data and second feature data.
In one embodiment of the present invention, the important of first user behavior data is calculated by xgboost algorithm
Property value is to divide into fisrt feature data and second feature data for first user behavior data.
In one embodiment of the present invention, calculated by cross entropy the importance values of first user behavior data with
First user behavior data is divided into fisrt feature data and second feature data.
According to another aspect of the present invention, a kind of registration probability estimating device, the registration probability estimating device packet are provided
It includes:
Module is obtained, for obtaining the first user behavior data according to User operation log stream;
First prediction model module, for mould to be predicted in first user behavior data input one housebroken first
Type, and obtain first prediction model multiple hidden layers data as second user behavioral data
Data configuration module, for being carried out at least partly described first user behavior data by importance values calculated
Cross conformation obtains third user behavior data;
Data processing module, for splicing second behavioral data and third behavioral data to obtain fourth line as number
According to;
Second prediction model module, for the fourth user behavioral data to be inputted the second prediction model, by described the
Discreet value of the output of two prediction models as the registration probability of user.
According to another aspect of the invention, a kind of storage medium is provided, is stored with computer program on the storage medium,
The computer program executes step as described above when being run by processor.
According to another aspect of the invention, a kind of electronic equipment is provided.The electronic equipment includes: processor;Storage is situated between
Matter, is stored thereon with computer program, and the computer program executes step as described above when being run by the processor.
Registration probability predictor method proposed by the invention is combined using Recognition with Recurrent Neural Network with traditional characteristic extraction
Technology acquires the behavioral data of user according to User operation log stream in real time and guarantees efficient result feedback speed, having both
User behavior is modeled under the premise of the good continuation performance of algorithm frame, the registration, purchase, click of user can be effectively predicted
The probability of equal behaviors.
Detailed description of the invention
Upon reading the detailed description of non-limiting embodiments with reference to the following drawings, other feature of the invention,
Objects and advantages will become more apparent upon.
Fig. 1 is the flow chart that probability predictor method is registered in one embodiment of the invention.
Fig. 2 is the flow chart that probability predictor method is registered in another embodiment of the present invention.
Fig. 3 is the structural schematic diagram that probability estimating device is registered in one embodiment of the invention.
Fig. 4 is the structural schematic diagram that probability estimating device is registered in another embodiment of the present invention.
Fig. 5 is the structural schematic diagram of computer readable storage medium in one embodiment of the invention.And
Fig. 6 is the structural schematic diagram of electronic equipment in one embodiment of the invention.
Specific embodiment
Example embodiment is described more fully with reference to the drawings.However, example embodiment can be with a variety of shapes
Formula is implemented, and is not understood as limited to example set forth herein;On the contrary, thesing embodiments are provided so that the disclosure will more
Fully and completely, and by the design of example embodiment comprehensively it is communicated to those skilled in the art.Described feature, knot
Structure or characteristic can be incorporated in any suitable manner in one or more embodiments.
In addition, attached drawing is only the schematic illustrations of the disclosure, it is not necessarily drawn to scale.Identical attached drawing mark in figure
Note indicates same or similar part, thus will omit repetition thereof.Some block diagrams shown in the drawings are function
Energy entity, not necessarily must be corresponding with physically or logically independent entity.These function can be realized using software form
Energy entity, or these functional entitys are realized in one or more hardware modules or integrated circuit, or at heterogeneous networks and/or place
These functional entitys are realized in reason device device and/or microcontroller device.
In order to solve the deficiencies in the prior art, the present invention provide a kind of registration probability predictor method, device, storage medium and
Electronic equipment, with the probability of the behaviors such as registration, purchase, click that user is effectively predicted, the registration probability reflection user is to right
The fancy grade of a certain APP.Fig. 1 is the flow chart that probability predictor method is registered in one embodiment of the invention.Fig. 2 is of the invention another
The flow chart of probability predictor method is registered in one embodiment.Fig. 3 is the knot that probability estimating device is registered in one embodiment of the invention
Structure schematic diagram.Fig. 4 is the structural schematic diagram that probability estimating device is registered in another embodiment of the present invention.Fig. 5 is that the present invention one is real
Apply the structural schematic diagram of computer readable storage medium in example.And Fig. 6 is the structure of electronic equipment in one embodiment of the invention
Schematic diagram.
According to an aspect of the present invention, a kind of registration probability predictor method is provided, as shown in Figure 1, the registration probability is pre-
The method of estimating includes:
S110, the first user behavior data is obtained according to User operation log stream.
Specifically, many initial characteristic datas are recite in User operation log stream, these initial characteristic datas usually by
Historical user's behavioural information, user basic information, user equipment information etc., which summarize, to be got, and first user behavior data is usual
User device type (can specifically include, user nearly seven days browsing times, users often log in place by the initial characteristic data
Etc.) pretreatment after obtain.
S120, first user behavior data is inputted into housebroken first prediction model, and obtains described first
The data of multiple hidden layers of prediction model are as second user behavioral data.
Specifically, first user behavior data has been subjected to pretreatment and can directly input first prediction at this time
Model.In one embodiment of the present invention, first prediction model is RNN model, and the RNN model includes an input
Layer, multiple hidden layers and an output layer, each hidden layer are a GRU unit.RNN model, that is, Recognition with Recurrent Neural Network the mould
Type, the principle of RNN model are that neural network model is added to the feature of timing.By hidden layer plus feedback side, each hidden layer it is defeated
Enter and not only includes current sample characteristics but also include information brought by a upper timing.Each GRU unit includes two doors, a weight
Set door and a update door.The result of the two have passed through a sigmoid function, and codomain is [0,1].Candidate hidden state
Resetting door has been used to control the inflow of the upper hidden state comprising last time information.If resetting door approximation 0, upper one
A hidden state will be dropped.Therefore, resetting door provides the mechanism of discarding with the following unrelated past hidden state, that is,
It says, how many information resetting door determines over and pass into silence.Hidden state Ht is using update door Zt come to a upper hidden state
Ht-1 and candidate hidden state are updated.Updating door can control past hidden state in the importance at current time.Such as
Fruit updates door approximation 1 always, and past hidden state will save all the time by the time and be transferred to current time.This design can
To cope with the gradient attenuation problem in Recognition with Recurrent Neural Network, and preferably captures in time series data and be spaced biggish dependence.
Resetting door helps to capture the dependence of time series data middle or short term.Updating door helps to capture dependence long-term in time series data
Relationship.The knot whether really registered according to user's operation data, user clickstream data and user of the offline storage in HDFS
Fruit updates recirculating network GRU and LR model parameter offline, and HDFS, that is, Hadoop distributed file system (HDFS) is designed
At being suitble to the distributed file system operated on common hardware (commodity hardware).
S130, cross conformation acquisition is carried out by importance values calculated at least partly described first user behavior data
Third user behavior data.
Due to containing much information in first user behavior data, it is therefore necessary to the weight of the much information
The property wanted is distinguished.Specifically, it can be calculated by modes such as variance evaluation, xgboost algorithm and cross entropies to described first
The importance values of all types of data are distinguished in user behavior data.
S140, second behavioral data and the third user behavior data are spliced to obtain fourth line as number
According to.
Specifically, if second behavioral data be [1,0,1,0,0], the third user behavior data be [0,0,
0,1,1], [1,0,1,0,0] and [0,0,0,1,1] are spliced and obtain the fourth line as data: [1,0,1,0,0,0,
0,0,1,1].Certainly, the fourth line is that data can also be by second behavioral data and the third user behavior data
It is otherwise calculated, the present invention makes limitation not to this.
S150, the fourth user behavioral data is inputted into the second prediction model, by the output of second prediction model
The discreet value of registration probability as user.
In one embodiment of the present invention, second prediction model is Logic Regression Models.The first prediction mould
Type and second prediction model are trained according to sample data, and the sample data includes user behavior data and user's note
Volume state.Wherein, the Logic Regression Models are disaggregated models common in machine learning, are primarily used to two classification and ask
Topic, feature space is mapped to a kind of possibility by it, and in Logic Regression Models, y is a qualitative variable { 0,1 }, and logic is returned
Model is returned to be mainly used for studying the probability that certain things occur.
Registration probability predictor method proposed by the invention is combined using Recognition with Recurrent Neural Network with traditional characteristic extraction
Technology acquires the behavioral data of user according to User operation log stream in real time and guarantees efficient result feedback speed, having both
User behavior is modeled under the premise of the good continuation performance of algorithm frame, the registration, purchase, click of user can be effectively predicted
The probability of equal behaviors.
Due to containing much information in first user behavior data, it is therefore necessary to the weight of the much information
The property wanted is distinguished.Fig. 2 is the flow chart that probability predictor method is registered in another embodiment of the present invention.As shown in Fig. 2, in this hair
In bright another embodiment, step S130 further comprises:
S1310, first user behavior data is divided into fisrt feature data and by importance values calculated
Two characteristics.
S1320, the second feature data that importance values are met to preset requirement carry out cross conformation, to form third
Characteristic, meanwhile, the fisrt feature data for keeping importance values to be not up to preset requirement are constant.For example there are two class weights
The property wanted value meets the second feature data of preset requirement: the age (be divided into greater than 20 years old, less than 20 years old two groups) and gender (point
For male, two groups of female), available 4 groups of third feature data, i.e. year are constructed by second feature data cross described in aforementioned two class
Age is greater than 20 years old and gender is male, the age is greater than 20 years old and gender is female, age less than 20 years old and gender is that male and age are small
In 20 years old and gender was female.
S1330, the third user behavior data is constituted with the fisrt feature data and the third feature data.By
This can cannot be obtained completely to avoid a large number of users information.
Furthermore, the importance values of first user behavior data can be calculated by variance evaluation with will be described
First user behavior data divides into fisrt feature data and second feature data.
Optionally, the importance values of first user behavior data are calculated by xgboost algorithm with by described first
User behavior data divides into fisrt feature data and second feature data.Wherein,
Xgboost has done the Taylor expansion of second order to loss function, and joined regular terms entirety except objective function
Optimal solution is sought, to weigh the decline of objective function and the complexity of model, avoids over-fitting.The present invention passes through xgboost
In importance values algorithm (importance) realize the calculating of the importance values of the first user behavior data.
Optionally, the importance values of first user behavior data are calculated by cross entropy with by the first user row
It is fisrt feature data and second feature data for data separation.Wherein, cross entropy can be made in neural network (machine learning)
For loss function, it is assumed that now with two probability distribution p, q in a sample set.Wherein, p indicates the distribution of authentic signature, and q is then
It is distributed for the predictive marker of the model after training, cross entropy loss function can measure the similitude of p and q.Pass through calculating as a result,
Similitude between first user behavior data is determined with carrying out two classification to the first user behavior data according to classification results
The importance values of each first user behavior data are maximum or minimum.Cross entropy be also an advantage that as loss function be using
Sigmoid function is avoided that the problem of mean square error loss function learning rate reduces when gradient declines, because of learning rate
The error that can be exported controls.Sigmoid function is a common S type function in biology, and also referred to as S type is grown
Curve.In information science, since singly properties, the Sigmoid function such as increasing and the increasing of inverse function list are often used as neural network for it
Threshold function table, by variable mappings to 0, between 1.
According to another aspect of the present invention, a kind of registration probability estimating device is provided, Fig. 3 is infused in one embodiment of the invention
Volume probability estimating device structural schematic diagram.As shown in figure 3, the registration probability estimating device 200 includes: to obtain module 201, the
One prediction model module 202, data configuration module 203, data processing module 204 and the second prediction model module 205.It is described to obtain
Modulus block 201 is used to obtain the first user behavior data according to User operation log stream.First prediction model module 202 is used for will
First user behavior data inputs housebroken first prediction model, and obtains the multiple hidden of first prediction model
Data containing layer are as second user behavioral data.The data configuration module 203 is used for at least partly described first user
Behavioral data carries out cross conformation by importance values calculated and obtains third user behavior data.The data processing module
204 for splicing second behavioral data and third behavioral data to obtain fourth line as data.Second prediction model
Module 205 is used to the fourth user behavioral data inputting the second prediction model, and the output of second prediction model is made
For the discreet value of the registration probability of user.The effect of each module, Yi Jicong in registration probability estimating device described in the present embodiment
It is general to the registration for obtaining user by the second prediction model module 205 to obtain the first user behavior data of acquisition of module 201
The specific steps and principle of the discreet value of rate are illustrated in the above-described embodiments, therefore are repeated no more.The present invention uses
Recognition with Recurrent Neural Network extracts the technology combined with traditional characteristic, acquires the behavior number of user in real time according to User operation log stream
According to and guarantee efficient result feedback speed, user behavior is built under the premise of having both algorithm frame good continuation performance
The probability of the behaviors such as the registration, purchase, click of user can be effectively predicted in mould.
Fig. 4 is the structural schematic diagram that probability estimating device is registered in another embodiment of the present invention.As shown in figure 4, the registration
The same of probability estimating device 200 includes obtaining module 201, the first prediction model module 202, data configuration module 203, data
Processing module 204 and the second prediction model module 205.In addition, the data configuration module 203 may further comprise: differentiation mould
Block 2031, cross conformation module 2032 and Data Integration module 2033.The acquisition module 201 is used for according to User operation log
Stream obtains the first user behavior data.First prediction model module 202 be used for by first user behavior data input once
The first trained prediction model, and obtain first prediction model multiple hidden layers data as second user behavior number
According to.The data configuration module 203 be used for at least partly described first user behavior data by importance values calculated into
Row cross conformation obtains third user behavior data.The data processing module 204 for splice second behavioral data and
Third behavioral data is to obtain fourth line as data.The second prediction model module 205 is used for the fourth user behavior
Data input the second prediction model, the discreet value by the output of second prediction model as the registration probability of user.It is described
Discriminating module is used to first user behavior data dividing into fisrt feature data and second by importance values calculated
Characteristic.The cross conformation module is for intersecting the second feature data that importance values meet preset requirement
Construction, to form third feature data.The Data Integration module is used for the fisrt feature data and the third feature
Data constitute the third user behavior data.The present invention extracts the skill combined with traditional characteristic using Recognition with Recurrent Neural Network
Art acquires the behavioral data of user according to User operation log stream in real time and guarantees efficient result feedback speed, calculates having both
User behavior is modeled under the premise of the good continuation performance of method frame, registration, purchase, click of user etc. can be effectively predicted
The probability of behavior.
In an exemplary embodiment of the present invention, a kind of computer readable storage medium is additionally provided, meter is stored thereon with
Calculation machine program, the program may be implemented registration probability described in any one above-mentioned embodiment and estimate when being executed by such as processor
The step of method.In some possible embodiments, various aspects of the invention are also implemented as a kind of program product
Form comprising program code, when described program product is run on the terminal device, said program code is for making the end
End equipment executes the step of the illustrative embodiments various according to the present invention of the above-mentioned registration probability predictor method description of this specification
Suddenly.
Fig. 5 is the structural schematic diagram of computer readable storage medium in one embodiment of the invention.Fig. 5 is described according to this hair
The program product 300 for realizing the above method of bright embodiment can use portable compact disc read only memory
(CD-ROM) it and including program code, and can be run on terminal device, such as PC.However, program of the invention
Product is without being limited thereto, and in this document, readable storage medium storing program for executing can be any tangible medium for including or store program, the program
Execution system, device or device use or in connection can be commanded.
Described program product 300 can be using any combination of one or more readable mediums.Readable medium can be can
Read signal medium or readable storage medium storing program for executing.Readable storage medium storing program for executing for example can be but be not limited to electricity, magnetic, optical, electromagnetic, infrared
The system of line or semiconductor, device or device, or any above combination.The more specific example of readable storage medium storing program for executing is (non-
The list of exhaustion) include: electrical connection with one or more conducting wires, portable disc, hard disk, random access memory (RAM),
Read-only memory (ROM), erasable programmable read only memory (EPROM or flash memory), optical fiber, the read-only storage of portable compact disc
Device (CD-ROM), light storage device, magnetic memory device or above-mentioned any appropriate combination.
The computer readable storage medium may include in a base band or the data as the propagation of carrier wave a part are believed
Number, wherein carrying readable program code.The data-signal of this propagation can take various forms, including but not limited to electromagnetism
Signal, optical signal or above-mentioned any appropriate combination.Readable storage medium storing program for executing can also be any other than readable storage medium storing program for executing
Readable medium, the readable medium can send, propagate or transmit for by instruction execution system, device or device use or
Person's program in connection.The program code for including on readable storage medium storing program for executing can transmit with any suitable medium, packet
Include but be not limited to wireless, wired, optical cable, RF etc. or above-mentioned any appropriate combination.
The program for executing operation of the present invention can be write with any combination of one or more programming languages
Code, described program design language include object oriented program language-Java, C++ etc., further include conventional mistake
Formula programming language-such as " C " language or similar programming language.Program code can be calculated fully in user
It executes in equipment, partly execute on a user device, executing, as an independent software package partially in user calculating equipment
Upper part executes on a remote computing or executes in remote computing device or server completely.It is being related to remotely counting
In the situation for calculating equipment, remote computing device can pass through the network of any kind, including local area network (LAN) or wide area network
(WAN), it is connected to user calculating equipment, or, it may be connected to external computing device (such as utilize ISP
To be connected by internet).
The present invention extracts the technology combined with traditional characteristic using Recognition with Recurrent Neural Network, real according to User operation log stream
When acquisition user behavioral data and guarantee efficient result feedback speed, having both the good continuation performance of algorithm frame
Under the premise of to user behavior model, the probability of the behaviors such as the registration, purchase, click of user can be effectively predicted.
In an exemplary embodiment of the present invention, a kind of electronic equipment is also provided, which may include processor,
And the memory of the executable instruction for storing the processor.Wherein, the processor is configured to via described in execution
Executable instruction is come the step of executing registration probability predictor method described in any one above-mentioned embodiment.
Person of ordinary skill in the field it is understood that various aspects of the invention can be implemented as system, method or
Program product.Therefore, various aspects of the invention can be embodied in the following forms, it may be assumed that complete hardware embodiment, complete
The embodiment combined in terms of full Software Implementation (including firmware, microcode etc.) or hardware and software, can unite here
Referred to as circuit, " module " or " system ".
The electronic equipment 400 of this embodiment according to the present invention is described referring to Fig. 6.The electronics that Fig. 6 is shown
Equipment 400 is only an example, should not function to the embodiment of the present invention and use scope bring any restrictions.
As shown in fig. 6, electronic equipment 400 is showed in the form of universal computing device.The component of electronic equipment 400 can wrap
It includes but is not limited to: at least one processing unit 410, at least one storage unit 420, (including the storage of the different system components of connection
Unit 420 and processing unit 410) bus 430, display unit 440 etc..
Wherein, the storage unit is stored with program code, and said program code can be held by the processing unit 410
Row, so that the processing unit 410 executes described in the above-mentioned registration probability predictor method part of this specification according to the present invention
The step of various illustrative embodiments.For example, the processing unit 410 can execute step as shown in fig. 1.
The storage unit 420 may include the readable medium of volatile memory cell form, such as random access memory
Unit (RAM) 4201 and/or cache memory unit 4202 can further include read-only memory unit (ROM) 4203.
The storage unit 420 can also include program/practical work with one group of (at least one) program module 4205
Tool 4204, such program module 4205 includes but is not limited to: operating system, one or more application program, other programs
It may include the realization of network environment in module and program data, each of these examples or certain combination.
Bus 430 can be to indicate one of a few class bus structures or a variety of, including storage unit bus or storage
Cell controller, peripheral bus, graphics acceleration port, processing unit use any bus structures in a variety of bus structures
Local bus.
Electronic equipment 400 can also be with one or more external equipments 500 (such as keyboard, sensing equipment, bluetooth equipment
Deng) communication, can also be enabled a user to one or more equipment interact with the electronic equipment 400 communicate, and/or with make
Any equipment (such as the router, modulation /demodulation that the electronic equipment 400 can be communicated with one or more of the other calculating equipment
Device etc.) communication.This communication can be carried out by input/output (I/O) interface 450.Also, electronic equipment 400 can be with
By network adapter 460 and one or more network (such as local area network (LAN), wide area network (WAN) and/or public network,
Such as internet) communication.Network adapter 460 can be communicated by bus 430 with other modules of electronic equipment 400.It should
Understand, although not shown in the drawings, other hardware and/or software module can be used in conjunction with electronic equipment 400, including but unlimited
In: microcode, device driver, redundant processing unit, external disk drive array, RAID system, tape drive and number
According to backup storage system etc..
Through the above description of the embodiments, those skilled in the art is it can be readily appreciated that example described herein is implemented
Mode can also be realized by software realization in such a way that software is in conjunction with necessary hardware.Therefore, according to the present invention
The technical solution of embodiment can be embodied in the form of software products, which can store non-volatile at one
Property storage medium (can be CD-ROM, USB flash disk, mobile hard disk etc.) in or network on, including some instructions are so that a calculating
Equipment (can be personal computer, server or network equipment etc.) executes the above-mentioned registration of embodiment according to the present invention
Probability predictor method.
The present invention extracts the technology combined with traditional characteristic using Recognition with Recurrent Neural Network, real according to User operation log stream
When acquisition user behavioral data and guarantee efficient result feedback speed, having both the good continuation performance of algorithm frame
Under the premise of to user behavior model, the probability of the behaviors such as the registration, purchase, click of user can be effectively predicted.
The above content is a further detailed description of the present invention in conjunction with specific preferred embodiments, and it cannot be said that
Specific implementation of the invention is only limited to these instructions.For those of ordinary skill in the art to which the present invention belongs, exist
Under the premise of not departing from present inventive concept, a number of simple deductions or replacements can also be made, all shall be regarded as belonging to of the invention
Protection scope.