CN104537252B - User Status list disaggregated model training method and device - Google Patents
User Status list disaggregated model training method and device Download PDFInfo
- Publication number
- CN104537252B CN104537252B CN201510006021.9A CN201510006021A CN104537252B CN 104537252 B CN104537252 B CN 104537252B CN 201510006021 A CN201510006021 A CN 201510006021A CN 104537252 B CN104537252 B CN 104537252B
- Authority
- CN
- China
- Prior art keywords
- positive training
- model
- training sample
- user
- feature vector
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Abstract
The present invention provides a kind of User Status list disaggregated model training method and devices, this method comprises: belonging at least two Positive training samples of designated user's state class known to obtaining;Each Positive training sample has at least two customer attribute informations;According to every customer attribute information of each Positive training sample, the sampling feature vectors of each Positive training sample are extracted;Model parameter is estimated according to the sampling feature vectors, and according to the model parameter generating probability density function model estimated;Generate User Status list disaggregated model, the User Status list disaggregated model includes feature vector for receiving input and the probability density estimation for calculating functional value, further includes indicating whether to belong to the classification decision models of the classification results of designated user's state class for calculating according to calculated functional value.User Status list disaggregated model training method provided by the invention and device, it is strong that classification performance is good, human factor influences small and generalization ability.
Description
Technical field
The present invention relates to technical field of computer information processing, more particularly to a kind of User Status list disaggregated model training
Method and apparatus.
Background technique
User Status is a kind of description with interim user property, for example, User Status can be student's state,
Child-bearing state, unmarried state etc..By detection User Status, diversity service can be provided according to User Status, such as
Only to user's pushed information with specific user's state perhaps provide service or to and without specific user's state
User push different information respectively or different services be provided.
Presently, there are a kind of methods of fairly simple detection User Status, need user's shape of user's sets itself oneself
State simultaneously stores, and can read User Status set by user when needed in this way to reach the mesh of detection User Status
's.But the method for this detection User Status needs user to set User Status by hand, and user is needed to cooperate, it is cumbersome,
Feasibility is low.
At present there is also a kind of method for detecting User Status, need to establish a kind of mathematical model of marking in advance, so
Postscript employs the behavioral data within the scope of the certain time of family, the user's shape that finds user by analyzing behavioral data with need to detect
The relevant information of state gives a mark to each information relevant to User Status using the mathematical model established in advance, multiple correlations
The score value of information be added to obtain total score.By the way that the total score to be compared and can judge with preset total score threshold value
Whether user locates have a certain User Status.
However, needing manually to set marking rule, human factor influences currently used for the mathematical model for detecting User Status
Greatly.Moreover, detecting User Status by way of marking, generalization ability is too weak, can not detect user's shape of potential user
State.Here so-called generalization ability (generalization ability) refers to machine learning algorithm to the adaptation energy of fresh sample
Power.
Summary of the invention
Based on this, it is necessary to be influenced for the mathematical model human factor currently used for detecting User Status big and general
The weak problem of change ability provides a kind of User Status list disaggregated model training method and device.
A kind of User Status list disaggregated model training method, which comprises
Belong at least two Positive training samples of designated user's state class known to acquisition;Each Positive training sample has at least
Two customer attribute informations;
According to every customer attribute information of each Positive training sample, extract the sample characteristics of each Positive training sample to
Amount;
Model parameter is estimated according to the sampling feature vectors, and according to the model parameter generating probability density estimated
Function model;
Generate User Status list disaggregated model, the User Status list disaggregated model include feature for receiving input to
The probability density estimation for measuring and calculating functional value further includes indicating whether for being calculated according to calculated functional value
Belong to the classification decision model of the classification results of designated user's state class.
A kind of User Status list disaggregated model training device, described device include:
Positive training sample obtains module, for obtaining the known at least two positive training samples for belonging to designated user's state class
This;Each Positive training sample has at least two customer attribute informations;
Sampling feature vectors extraction module extracts every for every customer attribute information according to each Positive training sample
The sampling feature vectors of a Positive training sample;
Model parameter estimation module, for estimating model parameter according to the sampling feature vectors, and according to estimating
Model parameter generating probability density function model;
Training execution module, for generating User Status list disaggregated model, the User Status list disaggregated model includes using
It further include for according to calculated letter in receiving the feature vector inputted and the probability density estimation for calculating functional value
Numerical operation goes out to indicate whether to belong to the classification decision model of the classification results of designated user's state class.
Above-mentioned User Status list disaggregated model training method and device, different from what is used in conventional mode identification method
The training that both positive and negative training sample carries out, but the training of multiple Positive training samples by belonging to designated user's state class obtains.
In this way relative to using positive negative training sample training obtain disaggregated model, can to avoid introduce negative training sample caused by point
The influence of class performance, classification performance are more preferable.Moreover, can reflect out user's category after the completion of User Status list disaggregated model training
Property information between existing inherent law, human factor influence very little, for the example except training sample have well prediction
Ability, generalization ability are strong.
Detailed description of the invention
Fig. 1 is in one embodiment for realizing the internal junction of the electronic equipment of User Status list disaggregated model training method
Composition;
Fig. 2 is the flow diagram of User Status list disaggregated model training method in one embodiment;
Fig. 3 is the schematic diagram of uniform kernel function in one embodiment;
Fig. 4 is the schematic diagram of normal state kernel function in one embodiment;
Fig. 5 is the sampling feature vectors distribution schematic diagram that training sample concentrates all Positive training samples in one embodiment;
Fig. 6 is that a hypersphere is found in sampling feature vectors shown in Fig. 5 in one embodiment to surround sample spy
Levy the schematic diagram of vector;
Fig. 7 is the schematic diagram classified in one embodiment using hypersphere as shown in FIG. 6;
Fig. 8 is process signal the step of detecting User Status corresponding to user identifier to be detected in one embodiment
Figure;
Fig. 9 is according to sampling feature vectors in one embodiment come flow diagram the step of estimating model parameter;
Figure 10 is flow diagram the step of obtaining the value range of model parameter in one embodiment;
Figure 11 is according to sampling feature vectors in another embodiment come flow diagram the step of estimating model parameter;
Figure 12 is flow diagram the step of calculating auxiliary median in one embodiment;
Figure 13 is the structural block diagram of User Status list disaggregated model training device in one embodiment;
Figure 14 is the structural block diagram of User Status list disaggregated model training device in another embodiment;
Figure 15 is the structural block diagram of the model parameter estimation module in one embodiment in Figure 13;
Figure 16 is the structural block diagram of User Status list disaggregated model training device in further embodiment;
Figure 17 is the structural block diagram of the model parameter estimation module in another embodiment in Figure 13;
Figure 18 is the structural block diagram of the auxiliary middle-value calculating module in one embodiment in Figure 17.
Specific embodiment
In order to make the objectives, technical solutions, and advantages of the present invention clearer, with reference to the accompanying drawings and embodiments, right
The present invention is further elaborated.It should be appreciated that the specific embodiments described herein are merely illustrative of the present invention, and
It is not used in the restriction present invention.
As shown in Figure 1, in one embodiment, providing a kind of electronic equipment, which includes total by system
Processor, memory, storage medium and the network interface of line connection.Wherein, the storage medium of the electronic equipment is stored with operation system
System, database, are also stored with a kind of User Status list disaggregated model training device.The User Status list disaggregated model training device
For realizing a kind of User Status list disaggregated model training method.The processor of the electronic equipment is configured as executing a kind of user
State list disaggregated model training method.The electronic equipment can be an independent equipment, or can be it is multiple can mutual connection
The electronic equipment group of the electronic equipment composition of letter, each functional module of User Status list disaggregated model training device can be distinguished
It is deployed on each electronic equipment in electronic equipment group.The electronic equipment can be desktop computer.
As shown in Fig. 2, in one embodiment, a kind of User Status list disaggregated model training method is provided, for instructing
Practice and generates one for detecting the User Status list disaggregated model of User Status.Single classification (One-Class-
Classification) problem, alternatively referred to as a classification problem refer to the label for only knowing certain a kind of sample, to unknown classification
Data judge whether the problem of belonging to such.Model then refers to that mathematical model, mathematical model are exactly to use word for certain purpose
The features of the description objective things such as equation or inequality that female, number and other mathematic signs are set up and its inner link
Mathematic(al) structure expression formula.User Status list disaggregated model then refers to that the feature vector to judge input of training acquisition in advance is
A kind of no mathematical model for belonging to designated user's state.The present embodiment is applied to the electronic equipment in above-mentioned Fig. 1 in this way
It illustrates.This method specifically comprises the following steps:
Step 202, at least two Positive training samples of designated user's state class are belonged to known to acquisition;Each Positive training sample
With at least two customer attribute informations.
Specifically, multiple Positive training samples are obtained to form training sample set, and each Positive training sample be respectively provided with to
Few two customer attribute informations.In order to guarantee to train the performance of the User Status list disaggregated model obtained, customer attribute information is excellent
Choose 10 or more.Here only with Positive training sample, and Positive training sample refers to the known instruction for belonging to designated user's state class
Practice sample.
Designated user's state is then a kind of User Status predetermined, and the present embodiment is mainly to educate with designated user's state
It is illustrated for youngster's state, corresponding Positive training sample is then the various user properties letter of the known user for belonging to child-bearing state
The set of breath.It is understood that different designated user's states can be set according to actual needs, for example it can be student's shape
State, unmarried state etc..Every customer attribute information of each Positive training sample is relevant to designated user's state.
Every customer attribute information of each Positive training sample can be derived from age of user attribute, user's gender attribute, use
Family educational background attribute, user take in attribute and behavioral data relevant to designated user's state.Wherein with designated user's state phase
The behavioral data of pass include but is not limited to added group relevant to designated user's state quantity, in social networks with it is specified
The searching times, related with designated user's state of the relevant information content of User Status, information relevant to designated user's state
Webpage number of clicks and product relevant to designated user's state search, browsing, collect, place an order and conclusion of the business number.
For example, when designated user's state is child-bearing state, then behavioral data relevant to child-bearing state accordingly
Including but not limited to: be added group's quantity relevant to child-bearing, in social networks with relevant information content of giving birth to children, with give birth to children
The number of clicks of relevant webpage, the enquirement number relevant to child-bearing initiated, child-bearing relevant information searching times, child-bearing are related
Single number under products browse number, child-bearing Related product searching times, child-bearing Related product, child-bearing Related product conclusion of the business number,
Child-bearing Related product collects number etc..
Similarly, when designated user's state is student's state, then behavioral data packet relevant to student's state accordingly
Include but be not limited to: be added in the relevant group's quantity of study discussions, social networks with learn relevant information content, with
The number of clicks for practising relevant webpage, the enquirement number relevant to study initiated, study relevant information searching times, study are used
Product are searched for, browsed, collecting, placing an order and conclusion of the business number etc..
Step 204, according to every customer attribute information of each Positive training sample, the sample of each Positive training sample is extracted
Feature vector.
In each customer attribute information of each Positive training sample, the value of partial user attributes information is numeric data, this
Can be directly using the numeric data as corresponding element in corresponding sampling feature vectors in the case of kind, for example give birth to children related
Products browse number, child-bearing Related product searching times etc..
It is not numerical value number there are also the value of partial user attributes information in each customer attribute information of each Positive training sample
According to, but there are the possible cases of several limited quantities, in this case with regard to needing to carry out this partial user attributes information
Quantization.Several possible cases of customer attribute information can specifically be indicated with different numerical value respectively, then belong to user
The numerical value that property information is quantified integrally is used as corresponding element in corresponding sampling feature vectors.
For example there are male and two kinds of situations of female in user's gender attribute, can indicate male and two kinds of possibility of female with 1 and 2 respectively
Situation, then a sampling feature vectors can be [1,10,25 ... ...], and 1,10 and 25 etc. in the sampling feature vectors are
It is the element of the sampling feature vectors.Wherein in sequence, the element 1 in the sampling feature vectors indicates that gender is male, element
10 indicate that group's quantity relevant to child-bearing has been added, and element 25 indicates information content relevant to child-bearing in social networks, etc.
Deng, and so on.
In one embodiment, for not being that the customer attribute information of numeric data quantifies, preset length can be used
Customer attribute information is indicated by 0 and 1 numerical string formed, and each of numerical string is respectively as sampling feature vectors
In independent element.1 quantity is 1 in preferably each numerical string.In the present embodiment, it is contemplated that be not the use of numeric data
Several possible cases existing for the attribute information of family are the relationships of equality, are quantified as different numerical value and integrally as sample if only used
Corresponding element in eigen vector, then can factor value size difference and cause existing for customer attribute information it is several possibility feelings
The significance level run-off the straight of shape influences the accuracy that the User Status list disaggregated model of training acquisition is classified.
Illustrate the quantization to the customer attribute information of categorical data.For example there is male and female two in user's gender attribute
Kind of possible case can indicate male and two kinds of situations of female with 10 and 01 respectively, then sampling feature vectors can for [1,0,
10,25 ... ...], wherein in sequence, the first two element 1 and 0 in the sampling feature vectors indicates that gender is male, member together
Group's quantity relevant to child-bearing has been added in 10 expression of element, and element 25 indicates information content relevant to child-bearing in social networks,
Etc., and so on.
In one embodiment, after step 204, further includes: carry out normalizing to each sampling feature vectors extracted
Change processing.In view of different customer attribute information dimensions, dimensional unit are different, directly training obtains User Status list classification mould
Type influences whether the classification performance of User Status list disaggregated model, it is necessary to be normalized.
In one embodiment, sampling feature vectors are normalized, can be used every in sampling feature vectors
The difference of a element and least member is divided by the difference quotient obtained of greatest member and least member as new sampling feature vectors
In each element.For example a sampling feature vectors, if [1,0,10,25 ... ...], wherein greatest member is 25, smallest element
Element is 0, then the feature vector after normalizing is [0.04,0,0.4,1 ... ...], each element value in such feature vector
Between 0~1, facilitate calculating.
In one embodiment, sampling feature vectors are normalized, can be calculated in sampling feature vectors
The mean value and standard deviation of each element, then with the difference of each element and the mean value in sampling feature vectors again divided by the standard
The quotient of difference is as each element in new sampling feature vectors.Certain normalized can also be using existing other at present
Mode is not listed one by one here.
Step 206, model parameter is estimated according to sampling feature vectors, and according to the model parameter generating probability estimated
Density function model.
Specifically, probability density estimation is a part of User Status list disaggregated model, spy for receiving input
Vector is levied, and the feature vector for calculating the input belongs to the probability of designated user's state class.Model parameter is probability density function
A parameter in model, trained main purpose are to obtain this model parameter.
In one embodiment, probability density estimation can use the Parzen Window (Ba Er based on kernel function
Gloomy window) probability density function generates probability density estimation.Specifically using kernel function at each sampling feature vectors
A window is opened, the probability density at window is estimated.Distribution for each sampling feature vectors, to itself position
Contribution is maximum, and the distribution contribution remoter from self-position is smaller.
Further, kernel function can choose uniform kernel function and normal state kernel function.Wherein uniform kernel function such as Fig. 3 institute
Show, uniform kernel function is referred to as rectangle kernel function;Normal state kernel function is then as shown in figure 4, normal state kernel function is referred to as
Gaussian kernel function.Position in the abscissa character pair space of kernel function, and ordinate then indicates corresponding positions in feature space
Set the probability distribution of the feature vector at place, for the present embodiment, expression be corresponding position in feature space feature
Vector belongs to the probability of designated user's state class.E is natural constant in gaussian kernel function, and μ is mathematic expectaion, and σ is standard deviation.
Referring to Fig. 5, it is assumed that the sampling feature vectors of all Positive training samples is concentrated to be distributed for training sample as shown in figure 5, that
As shown in fig. 6, training obtains the process of probability density estimation, found in sampling feature vectors exactly shown in Fig. 5
One hypersphere surrounds these sampling feature vectors.Referring to Fig. 7, feature vector to be detected for one, if it is super at this
Spherical surface surrounds in range, then illustrates that this feature vector belongs to designated user's state class, such as feature vector 701;If to be detected
Feature vector not the hypersphere surround range in, then illustrate that this feature vector is not belonging to designated user's state class, as feature to
Amount 702.
In one embodiment, using normal state kernel function training sample concentrate the sample characteristics of each Positive training sample to
A window is opened at amount, establishes Gauss model.Then probability density estimation can be indicated as shown in following formula (1):
Formula (1):
Probability density estimation represented by above-mentioned formula (1) is using natural constant as the truth of a matter, respectively with each sample
Function between feature vector and the feature vector of input is the sum of the exponential function of index;Each sampling feature vectors and input
Feature vector between function be respectively input feature vector taken with the transposition of the difference of corresponding sampling feature vectors it is negative, then
Multiplied by the negative secondary power of model parameter, multiplied by feature vector and the difference of corresponding sampling feature vectors of input.
Specifically, in formula (1), xiConcentrate the sampling feature vectors of each Positive training sample for training sample, i=1,
2 ..., n indicates sample serial number.Y indicates the feature vector of input, and function f (y) indicates pre-training User Status list disaggregated model
Probability density estimation.
That function exp () is indicated in formula (1) is (- (the y-x using natural constant e as the exponential function of the truth of a matteri)Th-2
(y-xi)) be the exponential function index, be the function between each sampling feature vectors and the feature vector of input.Specifically
Ground, (- (y-xi)Th-2(y-xi)) it is the feature vector y and corresponding sampling feature vectors x inputtediDifference (y-xi) transposition
(y-xi)TIt takes to bear and obtains-(y-xi)T, the negative secondary power multiplied by model parameter h is-(y-xi)Th-2, multiplied by the feature of input
Difference (the y-x of vector and corresponding sampling feature vectorsi).By each sampling feature vectors xiCorresponding (- (y-xi)Th-2(y-
xi)) summation, it obtains
As can be seen that by training, the model parameter h estimated in formula (1) is obtained with from above-mentioned formula (1)
The probability density estimation f (y) of pre-training User Status list disaggregated model.
Step 208, generate User Status list disaggregated model, User Status list disaggregated model include receive input feature to
The probability density estimation for measuring and calculating functional value further includes being calculated to indicate whether to belong to according to calculated functional value
The classification decision model of the classification results of designated user's state class.
Specifically, the User Status list disaggregated model of generation include probability density estimation and classification decision model,
Middle probability density estimation feature vector for receiving input simultaneously calculates functional value, and classification decision model is then used for basis
Calculated functional value calculates classification results, which indicates whether the feature vector of input belongs to designated user's state
Class.If the feature vector of input belongs to designated user's state class, illustrate that the feature vector of the input has designated user's state, it is no
It can not then determine its User Status.
Wherein, classification decision model is represented by as shown in following formula (2):
Formula (2):
Wherein, y indicates the feature vector inputted in formula (2), is exactly feature vector to be detected when detecting User Status,
The probability density estimation of function f (y) expression User Status list disaggregated model.The classification of γ presentation class decision model output
As a result, target presentation class result γ is to belong to designated user's state class, outlier then indicates to be not belonging to designated user's state
Class.θ indicate pre-defined function value threshold value, pre-defined function value threshold θ be it is previously given, can be true according to Positive training sample
Fixed.Pre-defined function value threshold value be it is previously given, can be and determined according to Positive training sample, such as by all positive training
The probability density estimation that sample all inputs pre-training User Status list disaggregated model obtains corresponding functional value, according to wherein
Maximum functional value determines pre-defined function value threshold value.
Above-mentioned User Status list disaggregated model training method, different from used in conventional mode identification method positive and negative two
The training that kind training sample carries out, but the training of multiple Positive training samples by belonging to designated user's state class obtains.Such phase
For using positive negative training sample training obtain disaggregated model, can to avoid introduce negative training sample caused by classification performance
Influence, classification performance is more preferable.Moreover, can reflect out customer attribute information after the completion of User Status list disaggregated model training
Between existing inherent law, human factor influence very little, for the example except training sample have good predictive ability, it is general
Change ability is strong.
As shown in figure 8, in one embodiment, which further includes that detection is to be detected
The step of User Status corresponding to user identifier, specifically comprise the following steps:
Step 802, at least two customer attribute informations corresponding to user identifier to be detected are obtained.
Specifically, user identifier is the character string that unique identification goes out user identity, may include number, symbol and letter
At least one of equal characters.User identifier to be detected is then the user identifier it needs to be determined that its corresponding User Status.This reality
It applies detection User Status in example and refers to whether detection user identifier to be detected corresponds to designated user's state.
The type of customer attribute information corresponding to user identifier to be detected is each positive trained sample that training sample is concentrated
The subset of this customer attribute information type, accordingly even when several customer attribute information items corresponding to user identifier to be detected
Number is less, still is able to classify but as long as part of user property aspect ratio is more significant.
At least two customer attribute informations can be derived from age of user attribute, Yong Huxing corresponding to user identifier to be detected
Other attribute, user's educational background attribute, user take in attribute and behavioral data relevant to designated user's state.Wherein with specified use
State relevant behavioral data in family includes but is not limited to added group relevant to designated user's state quantity, social networks
In information content relevant to designated user's state, the searching times of information relevant with designated user's state and designated user
The search of the number of clicks of the relevant webpage of state and product relevant to designated user's state, browsing, collect, place an order and at
Hand over number.
Similarly, when designated user's state is student's state, then behavioral data packet relevant to student's state accordingly
Include but be not limited to: be added in the relevant group's quantity of study discussions, social networks with learn relevant information content, with
The number of clicks for practising relevant webpage, the enquirement number relevant to study initiated, study relevant information searching times, study are used
Product are searched for, browsed, collecting, placing an order and conclusion of the business number etc..
Step 804, feature vector to be detected is extracted according to the customer attribute information got.
Used characteristic vector pickup mode when specifically, using training user's state list disaggregated model is obtained with basis
The customer attribute information got extracts feature vector to be detected.Specifically, it is for the customer attribute information intermediate value got
The customer attribute information of numeric data, can be directly using the numeric data as corresponding member in corresponding feature vector to be detected
Element.It is not the customer attribute information of numeric data for the customer attribute information intermediate value got, then it can be by the user property
Several possible cases of information indicate that the numerical value for then being quantified the customer attribute information is whole with different numerical value respectively
As corresponding element in corresponding feature vector to be detected.
In one embodiment, for not being that the customer attribute information of numeric data quantifies, preset length can be used
The customer attribute information is indicated by 0 and 1 numerical string formed, and each of numerical string is respectively as feature to be detected
Independent element in vector.1 quantity is 1 in preferably each numerical string.
In one embodiment, after step 804, further includes: the feature vector to be detected extracted is normalized
Processing.Specifically, in one embodiment, feature vector to be detected is normalized, can with feature to be detected to
The difference of each element and least member in amount is divided by the difference quotient obtained of greatest member and least member as newly to be checked
Survey each element in feature vector.In another embodiment, feature vector to be detected is normalized, Ke Yiji
Calculate the mean value and standard deviation of each element in feature vector to be detected, then in feature vector to be detected each element with
The difference of the mean value is again divided by the quotient of the standard deviation as each element in new feature vector to be detected.
Step 806, feature vector to be detected is inputted into User Status list disaggregated model, output indicates whether to belong to specified use
The classification results of family state class, with User Status corresponding to determination user identifier to be detected.
Specifically, by the probability density estimation of feature vector to be detected input User Status list disaggregated model, output
Functional value, then by the classification decision model of the functional value of output input User Status list disaggregated model, output category result.Than
Such as according to the classification decision model of above-mentioned formula (2), exports target expression and belong to designated user's state class, it can be accordingly
Determine that user corresponding to user identifier has designated user's state;It indicates to be not belonging to designated user's shape if exporting outlier
State class can provide user corresponding to user identifier without designated user's state at this time, or can not determine its User Status.
In one embodiment, after step 806, further includes: according to use corresponding to determining user identifier to be detected
Family state carrys out pushed information.The information of push can be advertising information, broadcast notice messages etc..Such as it can be for detection everywhere
Advertising information relevant to child-bearing is pushed in the user of child-bearing state, such pushed information is more targeted, guarantees that information passes
The validity reached.
In the present embodiment, several customer attribute informations corresponding to user identifier are given, so that it may determine use accordingly
The corresponding User Status of family mark, substantially without manpower intervention, human factor influence is small, and classification accuracy is compared to use
The accuracy rate of the mathematical model classification of artificial setting marking rule wants high.
As shown in figure 9, in one embodiment, the step of model parameter is estimated in step 206 according to sampling feature vectors
Suddenly, specifically comprise the following steps:
Step 902, all Positive training samples are divided into first kind Positive training sample and the second class Positive training sample.
Here all Positive training samples refer to all Positive training samples that training sample is concentrated.Specifically, the first kind is just instructed
Practicing sample is for trained Positive training sample, and the second class Positive training sample is then used to examine or is known as testing, and is specifically used for
It tests to the probability density estimation obtained according to first kind Positive training sample.Training sample can be concentrated all
For most of Positive training sample in Positive training sample as first kind Positive training sample, remaining Positive training sample is then used as second
Class Positive training sample.Such as 2/3rds conduct first kind Positive training samples, remaining one third in all Positive training samples
Then it is used as the second class training sample.
Step 904, the candidate parameter value of preset quantity is taken in the value range of model parameter.
Model parameter has value range, this value range can rule of thumb be set, can also be by calculating come really
The step of determining, the value range for calculating the model parameter can be provided below.It can be in the value range of model parameter every one
Fixed step size chooses a candidate parameter value, constitutes the set of candidate parameter value.The model parameter that candidate parameter value assumes that.
Step 906, single point of candidate user state is generated respectively according to first kind Positive training sample and each candidate parameter value
Class model, classify and statistical classification accuracy rate to the second class Positive training sample.
Specifically, the sampling feature vectors of each candidate parameter value and every part of first kind Positive training sample are substituted into above-mentioned
Corresponding candidate probability density function model is obtained in formula (1), which refers in candidate parameter
Probability density estimation under the hypothesis of value.
Second class Positive training sample is used to examine the classification accuracy of candidate probability density function model, specifically by each the
The sampling feature vectors of two class Positive training samples input candidate probability density function model respectively and obtain corresponding functional value, by phase
Classification decision model represented by functional value input above-mentioned formula (2) answered compares time by the operation of the classification decision model
Select the size of functional value Yu pre-defined function value threshold value, output category result.
Corresponding each candidate parameter value calculates separately classification accuracy, which is the second class Positive training sample quilt
It is categorized into the quantity of designated user's state class and the ratio of the second class Positive training sample sum.
Step 908, using the corresponding highest candidate parameter value of classification accuracy as the model parameter estimated.
Specifically, the size of match stop accuracy rate, classification accuracy is higher, and classification performance is better, indicates corresponding and waits
Select probability density estimation closer to optimal probability density estimation, by the highest candidate parameter value of corresponding accuracy rate
As the model parameter estimated.The model parameter that this is estimated brings probability density estimation function into, to obtain general
Rate density function model.
In the present embodiment, by the way that Positive training sample is divided into first kind Positive training sample and the second class Positive training sample,
Preceding a kind of for training, latter class is for examining, to show that, close to optimal model parameter, algorithm is easy to accomplish.
In one embodiment, step 902 specifically includes: all Positive training samples being divided into default number, in turn will
A copy of it is as the second class Positive training sample, and using remaining Positive training sample as first kind Positive training sample.Citing comes
It says, all Positive training samples that training sample is concentrated can be divided into 10 parts, just instructed a copy of it as the second class every time
Practice sample, and remaining 9 parts are used as first kind Positive training sample, are then estimated by continuing to execute step 904~step 908
Model parameter out.In this way by crosscheck, the model parameter estimated is more nearly optimal model parameter, to be approached
Optimal probability density estimation.
As shown in Figure 10, in one embodiment, which further includes obtaining model
The step of value range of parameter, specifically includes the following steps:
Step 1002, the Mean Matrix of the sampling feature vectors of all Positive training samples is calculated.
Specifically, (3) calculate Mean Matrix E according to the following formula:
Formula (3):
In formula (3), n is Positive training sample sum, xiThe sample characteristics of each Positive training sample are concentrated for training sample
Vector.Formula (3) is indicated the summation of all sampling feature vectors again divided by Positive training sample sum.
Step 1004, variance matrix is calculated according to Mean Matrix.
Specifically, (4) calculate variance matrix C according to the following formula:
Formula (4):
In formula (4), n is Positive training sample sum, xiThe sample characteristics of each Positive training sample are concentrated for training sample
Vector, E are calculated Mean Matrix in step 1002.Formula (4) expression corresponds respectively to each sampling feature vectors, calculates
Mean Matrix E subtracts the difference and the product of the transposition of the difference of the sampling feature vectors, then will corresponding to each sample characteristics to
After measuring calculated product summation, divided by Positive training sample sum and 1 difference.
Step 1006, variance matrix is substituted into model parameter function to obtain the value range of model parameter;Model parameter
Function is the reciprocal multiplied by extracting square root after the mark of variance matrix of the dimension of sampling feature vectors, multiplied by with Positive training sample sum
The exponential function born as index is taken for the truth of a matter and with the quotient of parametric variable and dimension;Parametric variable has default value range.
Specifically, variance matrix calculated in step 1004 is brought into model parameter function, wherein model parameter function
It is expressed as formula (5):
In formula (5), m indicates that the dimension of sampling feature vectors, tr (C) indicate the mark of variance matrix C, the mark of a matrix
Refer to the sum of element on the diagonal of a matrix;N is Positive training sample sum, and α is parametric variable.Then model parameter function is sample
The mark tr (C) reciprocal multiplied by variance matrix C of the dimension m of feature vector extracts square root afterwards isMultiplied by just to instruct
Practice total sample number n to be the truth of a matter and take to bear with the quotient of parametric variable α and dimension m and be for the exponential function of indexIt obtainsIt is 0 < α < 0.5 that parametric variable, which has default value range,.
In the present embodiment, the value range of model parameter is first determined, then using step 902~step 908 come according to sample
Model parameter in feature vector estimated probability density function model, can quickly determine preferably model parameter, further
Optimize the classification performance of User Status list disaggregated model.
As shown in figure 11, in one embodiment, the step of model parameter is estimated in step 206 according to sampling feature vectors
Suddenly, specifically comprise the following steps:
Step 1102, Euclidean distance value between any two in the sampling feature vectors of all Positive training samples is calculated.
If sampling feature vectors integrate as X={ x1, x2... ..., xn, wherein { xi|xi1,xi2,…,ximIndicate a sample
Feature vector.Integrate in sampling feature vectors as each sampling feature vectors x is sought in XiWith each sampling feature vectors xjEuclidean
Distance value is Dij(i≠j)。
Step 1104, the corresponding minimum Eustachian distance value of each sampling feature vectors is filtered out.
Specifically, from Euclidean distance value DijThe middle each sampling feature vectors x of correspondenceiFiltering out minimum Eustachian distance value is Ei
=min (Dij)。
Step 1106, the square root of the maximum value in all minimum Eustachian distance values is calculated using as first candidate family
Parameter.
Specifically, each sampling feature vectors x will be corresponded in step 1104iThe minimum Eustachian distance value E filtered outiIn
Maximum value max (Ei) square root sqrt (max (Ei)) using as first candidate family parameter h1.Wherein sqrt () refers to
Seek subduplicate function.Here there are sequences for first and following second equal descriptions.First candidate family
Parameter is the initial value of iterative calculation.
Step 1108, the step of using calculating auxiliary median is to calculate first auxiliary median.
Median is assisted to calculate candidate family parameter for assisting, finally to calculate model parameter.In first auxiliary
Between value be expressed as F1。
Step 1110, calculate first candidate family parameter square multiplied by Positive training sample sum and multiplied by sample characteristics
The dimension of vector is along with after first candidate family parameter, then divided by after the product of Positive training sample sum and dimension, then opens
Square, to obtain second candidate family parameter.
Specifically, second candidate family parameterIndicate first candidate family parameter
h1Square multiplied by Positive training sample sum n and multiplied by the dimension m of sampling feature vectors add first auxiliary median F1
Afterwards, it obtainsAgain divided by after the product m of Positive training sample sum n and dimension, then extract square root to obtainTo obtain second candidate family parameter h2。
Step 1112, second auxiliary median is calculated using the step of calculating auxiliary median.Second auxiliary is intermediate
Value is expressed as F2。
Step 1114, it is calculated using candidate family parameter middle-value calculating step among current candidate family parameter
Value.
It calculates and obtains first candidate family parameter h1, second candidate family parameter h2, first auxiliary median indicate
F1And second auxiliary median F2Afterwards, so that it may calculating is iterated using these iteration initial values, to estimate model ginseng
Number.h1、h2、F1And F2Subscript be serial number.
Since serial number third, current candidate family parameter median is calculated using following formula (6)
hmiddle:
Formula (6):
What is indicated in formula (6) is candidate family parameter middle-value calculating step, specifically: relative to current candidate mould
Shape parameter hqWith current auxiliary median Fq, calculate previous auxiliary median Fq-1With previous auxiliary median F againq-2's
First difference, then with the first difference multiplied by previous candidate family parameter hq-1Square, join multiplied by previous candidate family again
Number hq-2Square, then divided by the second difference, which is previous auxiliary median Fq-1Join with previous candidate family
Number hq-1Square product, then subtract again previous auxiliary median Fq-2With previous candidate family parameter h againq-2Square
Product.Wherein, it is previous with again it is previous be relative to candidate family parameter and auxiliary median serial number for, for
Current serial number q, it is previous to be expressed as q-1 then previous, it is expressed as q-2.
Step 1116, more current candidate family parameter median with 0 size;If thening follow the steps 1118 less than 0,
1120 are thened follow the steps if more than 0.
Specifically, more current candidate family parameter median hmiddleWith 0 size.
Step 1118, current candidate family is determined according to previous candidate family parameter and previous auxiliary median
Parameter.
If current candidate family parameter median hmiddle< 0, then current candidate family parameter beIndicate Positive training sample sum n multiplied by the dimension m of sampling feature vectors multiplied by previous
Candidate family parameter hq-1Square after add previous auxiliary median hq-1Afterwards, divided by Positive training sample sum n and sample
After the product of the dimension m of feature vector, extraction of square root.
Step 1120, current candidate family parameter is determined according to current candidate family parameter median.
Specifically, if current candidate family parameter median hmiddle> 0, then current candidate family parameter is hq
=sqrt (hmiddle), expression will be according to the calculated current candidate family parameter median h of formula (6)middleExtraction of square root.
Step 1122, judge current auxiliary median and current candidate family parameter whether and meanwhile meet it is respective repeatedly
For termination condition.If so, thening follow the steps 1124,1126 are thened follow the steps if not.
Specifically, current candidate family parameter h hereqStopping criterion for iteration are as follows: | 1-hq/hq-1| > threshold1,
Wherein threshold1 is first threshold.First threshold threshold1 can be 10-3, or can be 10-3Neighbouring value,
Such as [0.8*10-3, 1.2*10-3] value within the scope of this.|1-hq/hq-1| > threshold1 indicates current candidate family ginseng
Number hqDivided by previous candidate family parameter hq-1Quotient with 1 absolute value of the difference | 1-hq/hq-1| it is greater than first threshold
threshold1。
Here current auxiliary median FqStopping criterion for iteration are as follows: | 1-Fq/Fq-1| > threshold2, and | Fq|>
Threshold3 indicates current auxiliary median FqDivided by previous auxiliary median Fq-1Quotient with 1 absolute value of the difference |
1-Fq/Fq-1| it is greater than second threshold threshold2, and current auxiliary median FqAbsolute value be greater than third threshold value
threshold3.Here second threshold threshold2 can be 10-4, or can be 10-4Neighbouring value, such as [0.8*10-4, 1.2*10-4] value within the scope of this.Third threshold value threshold3 can be 10-70, or can be 10-70Neighbouring
Value, such as [0.8*10-70, 1.2*10-70] value within the scope of this.
Step 1124, using current candidate family parameter as the model parameter in probability density estimation.
Specifically, as | 1-hq/hq-1| > threshold1, | 1-Fq/Fq-1| > threshold2 and | Fq|>
When tri- stopping criterion for iteration of threshold3 are all satisfied, stop iteration, by current candidate family parameter hqIt is close as probability
The model parameter h in function model is spent, trained probability density estimation is obtained.
Step 1126, return step 1114 is next to calculate using candidate family parameter middle-value calculating step to continue
Candidate family parameter median simultaneously determines next candidate family parameter.When | 1-hq/hq-1| > threshold1, | 1-Fq/Fq-1|>
Threshold2 and | FqAs long as tri- stopping criterion for iteration of | > threshold3 have one it is invalid when, be returned to step
1114 continue iteration, until next auxiliary median and next candidate family parameter meet respective iteration end simultaneously
Only condition.
In the present embodiment, the model parameter h in probability density estimation is calculated by iteration convergence method, is gradually approached
Optimal model parameter h can also calculate preferably model parameter h.
As shown in figure 12, in one embodiment, the step of calculating auxiliary median specifically comprises the following steps:
Step 1202, each element in the first intermediary matrix is calculated, wherein ranks serial number not phase in the first intermediary matrix
Etc. element be set to the Euclidean distance value between corresponding sampling feature vectors, ranks serial number phase etc. in the first intermediary matrix
Element be set to the first preset positive value.
Specifically, the first intermediary matrix DD is line number, columns is the matrix of Positive training sample sum n.Calculate this first
When the element of intermediary matrix DD, row serial number i is corresponding, column serial number j with the sample serial number of sampling feature vectors.If i
≠ j, then enable DDij=Dij, DijFor each sampling feature vectors xiWith each sampling feature vectors xjEuclidean distance value;If i=
J then enables DDij=value1.Value1 is the first preset positive value, is a bigger numerical value, can use 1070Or take [0.8*
1070, 1.2*1070] value within the scope of this.
Step 1204, the minimum Euclidean corresponding with corresponding sampling feature vectors of each element in the first intermediary matrix is calculated
The difference of distance value, then divided by current candidate family parameter square two times, with generate with the first intermediary matrix in each member
The corresponding element value median of element.
Specifically, defining element value median isIndicate each element in the first intermediary matrix DD
DDijMinimum Eustachian distance value E corresponding with corresponding sampling feature vectorsiDifference, then divided by current candidate family parameter hq
Square two times.
Step 1206, the second intermediary matrix of the ranks number consistent complete zero of ranks number and the first intermediary matrix is constructed, if
Element value median corresponding to element in first intermediary matrix is less than the second preset positive value, then by phase in the second intermediary matrix
The element answered, which is set to, takes the numerical value born as power as the truth of a matter, with corresponding element value median using natural constant.
Specifically, construction ranks number be Positive training sample sum n complete zero the second intermediary matrix P=zeros (n,
n).If Yij< value2, then Pij=exp (- Yij), if indicating element value median corresponding to the element in the first intermediary matrix
YijLess than the second preset positive value value2, then by element P corresponding in the second intermediary matrix PijIt is set to exp (- Yij), exp (-
Yij) it is using natural constant as the truth of a matter, with corresponding element value median YijTake the numerical value born as power.Here value2 can
The numerical value in 16~24 is taken, preferably takes 20.
Step 1208, the element of the every row of the second intermediary matrix is added and is obtained the addition and value of corresponding every row;If the adduction
Value is equal to 0, then calculating corresponding first auxiliary parameter of the row is third preset positive value;If the addition and value is not equal to 0, calculate
Corresponding first auxiliary parameter of the row is the inverse of the addition and value out.
Specifically, it calculatesIf PPi=0, then FUi=value3;If PPi≠ 0, thenWherein,
PPiFor the addition and value that the element of second the i-th row of intermediary matrix is added to and is obtained corresponding i row.FUiIt is corresponding first auxiliary for the i-th row
Help parameter.Value3 is third preset positive value, can use 1.7977 × 10308Or take [0.8*1.7977 × 10308, 1.2*
1.7977×10308] numerical value within the scope of this.
Step 1210, by the element of row corresponding position every in each element of the first intermediary matrix and the second intermediary matrix
It sums after multiplication, to obtain the second auxiliary parameter of corresponding every row.
Specifically, it calculatesWherein FFiFor corresponding second auxiliary parameter of the i-th row, DDijIt is first
The element of i-th row, jth column, P in intermediary matrix DDijFor the element of the i-th row, jth column in the second intermediary matrix P.
Step 1212, it sums after corresponding first auxiliary parameter of every row being multiplied with the second auxiliary parameter respectively, then divided by
Current candidate family parameter square after, then subtract the product of the dimension of Positive training sample sum and sampling feature vectors, with
As current auxiliary median.
Specifically, it calculatesWherein FqFor current auxiliary median.Calculate currently auxiliary
Help median FqWhen, respectively by the corresponding first auxiliary parameter FU of every rowiWith the second auxiliary parameter FFiIt sums after multiplication, then divided by
Current candidate family parameter hqSquare after, then subtract the dimension m of Positive training sample sum n and sampling feature vectors and multiply
Product, obtains current auxiliary median Fq。
In the present embodiment, the step of calculating assists median is provided, for calculating probability by iteration convergence method
When model parameter h in density function model, auxiliary median needed for iterative calculation is provided.
As shown in figure 13, in one embodiment, a kind of User Status list disaggregated model training device 1300 is provided, is had
There is the function for the User Status list disaggregated model training method for realizing above-mentioned each embodiment.User Status list disaggregated model instruction
Practicing device 1300 includes: that Positive training sample obtains module 1310, sampling feature vectors extraction module 1320, model parameter estimation mould
Block 1330 and training execution module 1340.
Positive training sample obtains module 1310, for obtaining the known at least two positive training for belonging to designated user's state class
Sample;Each Positive training sample has at least two customer attribute informations.
Specifically, Positive training sample obtains module 1310 and is used to obtain multiple Positive training samples to form training sample set,
And each Positive training sample is respectively provided at least two customer attribute informations.In order to guarantee the User Status list classification of training acquisition
The performance of model, customer attribute information preferably take 10 or more.Here only with Positive training sample, and Positive training sample refers to
Know the training sample for belonging to designated user's state class.
Designated user's state is then a kind of User Status predetermined, and the present embodiment is mainly to educate with designated user's state
It is illustrated for youngster's state, corresponding Positive training sample is then the various user properties letter of the known user for belonging to child-bearing state
The set of breath.It is understood that different designated user's states can be set according to actual needs, for example it can be student's shape
State, unmarried state etc..Every customer attribute information of each Positive training sample is relevant to designated user's state.
Every customer attribute information of each Positive training sample can be derived from age of user attribute, user's gender attribute, use
Family educational background attribute, user take in attribute and behavioral data relevant to designated user's state.Wherein with designated user's state phase
The behavioral data of pass include but is not limited to added group relevant to designated user's state quantity, in social networks with it is specified
The searching times, related with designated user's state of the relevant information content of User Status, information relevant to designated user's state
Webpage number of clicks and product relevant to designated user's state search, browsing, collect, place an order and conclusion of the business number.
Similarly, when designated user's state is student's state, then behavioral data packet relevant to student's state accordingly
Include but be not limited to: be added in the relevant group's quantity of study discussions, social networks with learn relevant information content, with
The number of clicks for practising relevant webpage, the enquirement number relevant to study initiated, study relevant information searching times, study are used
Product are searched for, browsed, collecting, placing an order and conclusion of the business number etc..
Sampling feature vectors extraction module 1320 is mentioned for every customer attribute information according to each Positive training sample
Take the sampling feature vectors of each Positive training sample.
In each customer attribute information of each Positive training sample, the value of partial user attributes information is numeric data, this
Kind in the case of sampling feature vectors extraction module 1320 can be used for directly using the numeric data as corresponding sample characteristics to
Corresponding element in amount, such as child-bearing Related product browsing time, child-bearing Related product searching times etc..
It is not numerical value number there are also the value of partial user attributes information in each customer attribute information of each Positive training sample
According to, but there are the possible cases of several limited quantities, sampling feature vectors extraction module 1320 can be used in this case
This partial user attributes information is quantified.Specifically several possible cases of customer attribute information can be used into difference respectively
Numerical value indicate, then the numerical value that customer attribute information is quantified integrally is used as corresponding in corresponding sampling feature vectors
Element.
In one embodiment, for not being that the customer attribute information of numeric data quantifies, preset length can be used
Customer attribute information is indicated by 0 and 1 numerical string formed, and each of numerical string is respectively as sampling feature vectors
In independent element.1 quantity is 1 in preferably each numerical string.In the present embodiment, it is contemplated that be not the use of numeric data
Several possible cases existing for the attribute information of family are the relationships of equality, are quantified as different numerical value and integrally as sample if only used
Corresponding element in eigen vector, then can factor value size difference and cause existing for customer attribute information it is several possibility feelings
The significance level run-off the straight of shape influences the accuracy that the User Status list disaggregated model of training acquisition is classified.
In one embodiment, sampling feature vectors extraction module 1320 can be also used for special to each sample extracted
Sign vector is normalized.In view of different customer attribute information dimensions, dimensional unit are different, directly training is used
Family state list disaggregated model, influences whether the classification performance of User Status list disaggregated model, it is necessary to be normalized.
In one embodiment, sampling feature vectors are normalized, can be used every in sampling feature vectors
The difference of a element and least member is divided by the difference quotient obtained of greatest member and least member as new sampling feature vectors
In each element.For example a sampling feature vectors, if [1,0,10,25 ... ...], wherein greatest member is 25, smallest element
Element is 0, then the feature vector after normalizing is [0.04,0,0.4,1 ... ...], each element value in such feature vector
Between 0~1, facilitate calculating.
In one embodiment, sampling feature vectors are normalized, can be calculated in sampling feature vectors
The mean value and standard deviation of each element, then with the difference of each element and the mean value in sampling feature vectors again divided by the standard
The quotient of difference is as each element in new sampling feature vectors.Certain normalized can also be using existing other at present
Mode is not listed one by one here.
Model parameter estimation module 1330, for estimating model parameter according to sampling feature vectors, and according to estimating
Model parameter generating probability density function model.
Specifically, probability density estimation is a part of User Status list disaggregated model, spy for receiving input
Vector is levied, and the feature vector for calculating the input belongs to the probability of designated user's state class.Model parameter is probability density function
A parameter in model, trained main purpose are to obtain this model parameter.
In one embodiment, probability density estimation can use the Parzen Window probability based on kernel function
Density function generates probability density estimation.A window is specifically opened at each sampling feature vectors using kernel function
Mouthful, estimate the probability density at window.For each sampling feature vectors, the distribution of itself position is contributed most
Greatly, the distribution remoter from self-position contribution is smaller.
Further, kernel function can choose uniform kernel function and normal state kernel function.Wherein uniform kernel function such as Fig. 3 institute
Show, uniform kernel function is referred to as rectangle kernel function;Normal state kernel function is then as shown in figure 4, normal state kernel function is referred to as
Gaussian kernel function.Position in the abscissa character pair space of kernel function, and ordinate then indicates corresponding positions in feature space
Set the probability distribution of the feature vector at place, for the present embodiment, expression be corresponding position in feature space feature
Vector belongs to the probability of designated user's state class.E is natural constant in gaussian kernel function, and μ is mathematic expectaion, and σ is standard deviation.
Referring to Fig. 5, it is assumed that the sampling feature vectors of all Positive training samples is concentrated to be distributed for training sample as shown in figure 5, that
As shown in fig. 6, training obtains the process of probability density estimation, found in sampling feature vectors exactly shown in Fig. 5
One hypersphere surrounds these sampling feature vectors.Referring to Fig. 7, feature vector to be detected for one, if it is super at this
Spherical surface surrounds in range, then illustrates that this feature vector belongs to designated user's state class, such as feature vector 701;If to be detected
Feature vector not the hypersphere surround range in, then illustrate that this feature vector is not belonging to designated user's state class, as feature to
Amount 702.
In one embodiment, using normal state kernel function training sample concentrate the sample characteristics of each Positive training sample to
A window is opened at amount, establishes Gauss model.Then probability density estimation can be indicated as shown in following formula (1):
Formula (1):
Probability density estimation represented by above-mentioned formula (1) is using natural constant as the truth of a matter, respectively with each sample
Function between feature vector and the feature vector of input is the sum of the exponential function of index;Each sampling feature vectors and input
Feature vector between function be respectively input feature vector taken with the transposition of the difference of corresponding sampling feature vectors it is negative, then
Multiplied by the negative secondary power of model parameter, multiplied by feature vector and the difference of corresponding sampling feature vectors of input.
Specifically, in formula (1), xiConcentrate the sampling feature vectors of each Positive training sample for training sample, i=1,
2 ..., n indicates sample serial number.Y indicates the feature vector of input, and function f (y) indicates pre-training User Status list disaggregated model
Probability density estimation.
That function exp () is indicated in formula (1) is (- (the y-x using natural constant e as the exponential function of the truth of a matteri)Th-2
(y-xi)) be the exponential function index, be the function between each sampling feature vectors and the feature vector of input.Specifically
Ground, (- (y-xi)Th-2(y-xi)) it is the feature vector y and corresponding sampling feature vectors x inputtediDifference (y-xi) transposition
(y-xi)TIt takes to bear and obtains-(y-xi)T, the negative secondary power multiplied by model parameter h is-(y-xi)Th-2, multiplied by the feature of input
Difference (the y-x of vector and corresponding sampling feature vectorsi).By each sampling feature vectors xiCorresponding (- (y-xi)Th-2(y-
xi)) summation, it obtains
As can be seen that by training, the model parameter h estimated in formula (1) is obtained with from above-mentioned formula (1)
The probability density estimation f (y) of pre-training User Status list disaggregated model.
Training execution module 1340, for generating User Status list disaggregated model, User Status list disaggregated model includes using
It further include for according to calculated letter in receiving the feature vector inputted and the probability density estimation for calculating functional value
Numerical operation goes out to indicate whether to belong to the classification decision model of the classification results of designated user's state class.
Specifically, training execution module 1340 generate User Status list disaggregated model include probability density estimation and
Classify decision model, wherein probability density estimation feature vector for receiving input and calculates functional value, classification is sentenced
Cover half type is then used to calculate classification results according to calculated functional value, and whether which indicates the feature vector inputted
Belong to designated user's state class.If the feature vector of input belongs to designated user's state class, illustrate the feature vector tool of the input
There is designated user's state, otherwise can not determine its User Status.
Wherein, classification decision model is represented by as shown in following formula (2):
Formula (2):
Wherein, y indicates the feature vector inputted in formula (2), is exactly feature vector to be detected when detecting User Status,
The probability density estimation of function f (y) expression User Status list disaggregated model.The classification of γ presentation class decision model output
As a result, target presentation class result γ is to belong to designated user's state class, outlier then indicates to be not belonging to designated user's state
Class.θ indicate pre-defined function value threshold value, pre-defined function value threshold θ be it is previously given, can be true according to Positive training sample
Fixed.Pre-defined function value threshold value be it is previously given, can be and determined according to Positive training sample, such as by all positive training
The probability density estimation that sample all inputs pre-training User Status list disaggregated model obtains corresponding functional value, according to wherein
Maximum functional value determines pre-defined function value threshold value.
Above-mentioned User Status list disaggregated model training device 1300, different from being used just in conventional mode identification method
The training that minus two kinds of training samples carry out, but the training of multiple Positive training samples by belonging to designated user's state class obtains.This
Sample relative to using positive negative training sample training obtain disaggregated model, can to avoid introduce negative training sample caused by classification
The influence of performance, classification performance are more preferable.Moreover, can reflect out user property after the completion of User Status list disaggregated model training
Existing inherent law between information, human factor influence very little, have prediction energy well for the example except training sample
Power, generalization ability are strong.
As shown in figure 14, in one embodiment, User Status list disaggregated model training device 1300 further include: user
Attribute information obtains module 1350, characteristic vector pickup module 1360 to be detected and categorization module 1370.
Customer attribute information obtains module 1350, and for obtaining, at least two users belong to corresponding to user identifier to be detected
Property information.
Specifically, user identifier is the character string that unique identification goes out user identity, may include number, symbol and letter
At least one of equal characters.User identifier to be detected is then the user identifier it needs to be determined that its corresponding User Status.
The type of customer attribute information corresponding to user identifier to be detected is each positive trained sample that training sample is concentrated
The subset of this customer attribute information type, accordingly even when several customer attribute information items corresponding to user identifier to be detected
Number is less, still is able to classify but as long as part of user property aspect ratio is more significant.
At least two customer attribute informations can be derived from age of user attribute, Yong Huxing corresponding to user identifier to be detected
Other attribute, user's educational background attribute, user take in attribute and behavioral data relevant to designated user's state.Wherein with specified use
State relevant behavioral data in family includes but is not limited to added group relevant to designated user's state quantity, social networks
In information content relevant to designated user's state, the searching times of information relevant with designated user's state and designated user
The search of the number of clicks of the relevant webpage of state and product relevant to designated user's state, browsing, collect, place an order and at
Hand over number.
Similarly, when designated user's state is student's state, then behavioral data packet relevant to student's state accordingly
Include but be not limited to: be added in the relevant group's quantity of study discussions, social networks with learn relevant information content, with
The number of clicks for practising relevant webpage, the enquirement number relevant to study initiated, study relevant information searching times, study are used
Product are searched for, browsed, collecting, placing an order and conclusion of the business number etc..
Characteristic vector pickup module 1360 to be detected, for extracting feature to be detected according to the customer attribute information got
Vector.
Used characteristic vector pickup mode when specifically, using training user's state list disaggregated model is obtained with basis
The customer attribute information got extracts feature vector to be detected.Specifically, it is for the customer attribute information intermediate value got
The customer attribute information of numeric data, can be directly using the numeric data as corresponding member in corresponding feature vector to be detected
Element.It is not the customer attribute information of numeric data for the customer attribute information intermediate value got, then it can be by the user property
Several possible cases of information indicate that the numerical value for then being quantified the customer attribute information is whole with different numerical value respectively
As corresponding element in corresponding feature vector to be detected.
In one embodiment, for not being that the customer attribute information of numeric data quantifies, preset length can be used
The customer attribute information is indicated by 0 and 1 numerical string formed, and each of numerical string is respectively as feature to be detected
Independent element in vector.1 quantity is 1 in preferably each numerical string.
In one embodiment, characteristic vector pickup module 1360 to be detected be also used to the feature to be detected extracted to
Amount is normalized.Specifically, feature vector to be detected is normalized, it can be in feature vector to be detected
Each element and least member difference divided by the difference quotient obtained of greatest member and least member as spy to be detected newly
Levy each element in vector.Feature vector to be detected is normalized, can be calculated in feature vector to be detected
The mean value and standard deviation of each element, then with the difference of each element and the mean value in feature vector to be detected again divided by the mark
The quotient of quasi- difference is as each element in new feature vector to be detected.
Categorization module 1370, for feature vector to be detected to be inputted User Status list disaggregated model, output is indicated whether
Belong to the classification results of designated user's state class, with User Status corresponding to determination user identifier to be detected.
Specifically, by the probability density estimation of feature vector to be detected input User Status list disaggregated model, output
Functional value, then by the classification decision model of the functional value of output input User Status list disaggregated model, output category result.Than
Such as according to the classification decision model of above-mentioned formula (2), exports target expression and belong to designated user's state class, it can be accordingly
Determine that user corresponding to user identifier has designated user's state;It indicates to be not belonging to designated user's shape if exporting outlier
State class can provide user corresponding to user identifier without designated user's state at this time, or can not determine its User Status.
In one embodiment, the User Status list disaggregated model training device 1300 further include pushing module (in figure not
Show), for the User Status according to corresponding to determining user identifier to be detected come pushed information.The information of push can be
Advertising information, broadcast notice messages etc..Such as it can be relevant to child-bearing for the user's push detected in child-bearing state
Advertising information, such pushed information is more targeted, guarantees the validity that information is conveyed.
In the present embodiment, several customer attribute informations corresponding to user identifier are given, so that it may determine use accordingly
The corresponding User Status of family mark, substantially without manpower intervention, human factor influence is small, and classification accuracy is compared to use
The accuracy rate of the mathematical model classification of artificial setting marking rule wants high.
As shown in figure 15, in one embodiment, model parameter estimation module 1330 include: sample division module 1331,
Candidate parameter value chooses module 1332, statistic of classification module 1333 and model parameter determining module 1334.
Sample division module 1331, for all Positive training samples to be divided into first kind Positive training sample and the second class just
Training sample.
Here all Positive training samples refer to all Positive training samples that training sample is concentrated.Specifically, the first kind is just instructed
Practicing sample is for trained Positive training sample, and the second class Positive training sample is then used to examine or is known as testing, and is specifically used for
It tests to the probability density estimation obtained according to first kind Positive training sample.Training sample can be concentrated all
For most of Positive training sample in Positive training sample as first kind Positive training sample, remaining Positive training sample is then used as second
Class Positive training sample.Such as 2/3rds conduct first kind Positive training samples, remaining one third in all Positive training samples
Then it is used as the second class training sample.
Candidate parameter value chooses module 1332, for taking the candidate parameter of preset quantity in the value range of model parameter
Value.
Model parameter has value range, this value range can rule of thumb be set, can also be by calculating come really
The step of determining, the value range for calculating the model parameter can be provided below.It can be in the value range of model parameter every one
Fixed step size chooses a candidate parameter value, constitutes the set of candidate parameter value.The model parameter that candidate parameter value assumes that.
Statistic of classification module 1333, for generating candidate respectively according to first kind Positive training sample and each candidate parameter value
User Status list disaggregated model, classify and statistical classification accuracy rate to the second class Positive training sample.
Specifically, the sampling feature vectors of each candidate parameter value and every part of first kind Positive training sample are substituted into above-mentioned
Corresponding candidate probability density function model is obtained in formula (1), which refers in candidate parameter
Probability density estimation under the hypothesis of value.
Second class Positive training sample is used to examine the classification accuracy of candidate probability density function model, specifically by each the
The sampling feature vectors of two class Positive training samples input candidate probability density function model respectively and obtain corresponding functional value, by phase
Classification decision model represented by functional value input above-mentioned formula (2) answered compares time by the operation of the classification decision model
Select the size of functional value Yu pre-defined function value threshold value, output category result.
Corresponding each candidate parameter value calculates separately classification accuracy, which is the second class Positive training sample quilt
It is categorized into the quantity of designated user's state class and the ratio of the second class Positive training sample sum.
Model parameter determining module 1334, for classification accuracy highest candidate parameter value will to be corresponded to as estimating
Model parameter.
Specifically, the size of match stop accuracy rate, classification accuracy is higher, and classification performance is better, indicates corresponding and waits
Select probability density estimation closer to optimal probability density estimation, by the highest candidate parameter value of corresponding accuracy rate
As the model parameter estimated.The model parameter that this is estimated brings probability density estimation function into, to obtain general
Rate density function model.
In the present embodiment, by the way that Positive training sample is divided into first kind Positive training sample and the second class Positive training sample,
Preceding a kind of for training, latter class is for examining, to show that, close to optimal model parameter, algorithm is easy to accomplish.
In one embodiment, sample division module 1331 is specifically used for all Positive training samples being divided into default part
Number in turn using a copy of it as the second class Positive training sample, and just trains sample for remaining Positive training sample as the first kind
This.
As shown in figure 16, in one embodiment, User Status list disaggregated model training device 1300 further include: mean value
Matrix computing module 1380, variance matrix computing module 1390 and model parameter value range computing module 1399.
Mean Matrix computing module 1380, the Mean Matrix of the sampling feature vectors for calculating all Positive training samples.
Variance matrix computing module 1390, for calculating variance matrix according to Mean Matrix.
Model parameter value range computing module 1399, for variance matrix to be substituted into model parameter function to obtain model
The value range of parameter;Model parameter function is that the inverse of the dimension of sampling feature vectors is flat multiplied by opening after the mark of variance matrix
Side, takes the exponential function born as index multiplied by by the truth of a matter of Positive training sample sum and with the quotient of parametric variable and dimension;Ginseng
Number variable has default value range.
As shown in figure 17, in another embodiment, model parameter estimation module 1330 includes: that Euclidean distance value calculates mould
Block 1330a, screening module 1330b, first candidate family parameter calculating module 1330c, first auxiliary middle-value calculating mould
Block 1330d, second candidate family parameter calculating module 1330e, second auxiliary middle-value calculating module 1330f, current time
Modeling shape parameter middle-value calculating module 1330g, current candidate model parameter determining module 1330h, judgment module 1330i, mould
Shape parameter determining module 1330j and iterative calculation module 1330k.
Euclidean distance value computing module 1330a, in the sampling feature vectors for calculating all Positive training samples two-by-two it
Between Euclidean distance value.
Screening module 1330b, for filtering out the corresponding minimum Eustachian distance value of each sampling feature vectors.
First candidate family parameter calculating module 1330c, for calculating the maximum value in all minimum Eustachian distance values
Square root using as first candidate family parameter.
First auxiliary middle-value calculating module 1330d, for by assisting middle-value calculating module 1335 to calculate
First auxiliary median.Auxiliary middle-value calculating module 1335 may belong to User Status list disaggregated model training device
1300, it is also possible to independent module.
Second candidate family parameter calculating module 1330e, for calculate first candidate family parameter square multiplied by
Positive training sample sum and multiplied by the dimension of sampling feature vectors along with after first candidate family parameter, then divided by positive training
It after total sample number and the product of dimension, then extracts square root, to obtain second candidate family parameter.
Second auxiliary middle-value calculating module 1330f, for calculating second by auxiliary middle-value calculating module 1335
A auxiliary median.
Current candidate model parameter middle-value calculating module 1330g, for passing through candidate family parameter median operation mould
Block 1336 calculates current candidate family parameter median.Candidate family parameter median operation module 1336 may belong to use
Family state list disaggregated model training device 1300, is also possible to independent module.
Current candidate model parameter determining module 1330h, if for current candidate family parameter median less than 0,
Current candidate family parameter is determined according to previous candidate family parameter and previous auxiliary median;If current candidate
Model parameter median is greater than 0, then current candidate family parameter is determined according to current candidate family parameter median.
Judgment module 1330i, for judge current auxiliary median and current candidate family parameter whether and meanwhile it is full
The respective stopping criterion for iteration of foot.
Model parameter determining module 1330j, if simultaneously for current auxiliary median and current candidate family parameter
Meet respective stopping criterion for iteration, then joins current candidate family parameter as the model in probability density estimation
Number.
Module 1330k is iterated to calculate, if not full simultaneously for current auxiliary median and current candidate family parameter
The respective stopping criterion for iteration of foot, then notify current candidate model parameter middle-value calculating module 1330g to continue through candidate
Model parameter median operation module 1336 calculates next candidate family parameter median, and current candidate model is notified to join
Number determining module 1330h determines next candidate family parameter, until next auxiliary median and next candidate mould
Shape parameter meets respective stopping criterion for iteration simultaneously.
In one embodiment, candidate family parameter median operation module 1336 is used for relative to current candidate family
Parameter and current auxiliary median calculate the first difference of previous auxiliary median with previous auxiliary median again, then
With the first difference multiplied by square of previous candidate family parameter, multiplied by square of previous candidate family parameter again, then remove
With the second difference, second difference be previous auxiliary median and previous candidate family parameter square product, then subtract
Go again previous auxiliary median and previous candidate family parameter again square product.
As shown in figure 18, in one embodiment, auxiliary middle-value calculating module 1335 includes: the processing of the first intermediary matrix
Module 1335a, element value middle-value calculating module 1335b, the second intermediary matrix constructing module 1335c, the first auxiliary parameter meter
Calculate module 1335d, the second auxiliary parameter computing module 1335e and auxiliary median generation module 1335f.
First intermediary matrix processing module 1335a, for calculating each element in the first intermediary matrix, wherein in first
Between in matrix the element at the unequal place of ranks serial number be set to the Euclidean distance value between corresponding sampling feature vectors, among first
The element of ranks serial number phase etc. is set to the first preset positive value in matrix.
Element value middle-value calculating module 1335b, for each element in the first intermediary matrix of calculating and corresponding sample
The difference of the corresponding minimum Eustachian distance value of feature vector, then divided by current candidate family parameter square two times, with generate
Element value median corresponding with element each in the first intermediary matrix.
Second intermediary matrix constructing module 1335c, it is consistent for constructing ranks number and the ranks number of the first intermediary matrix
Complete zero the second intermediary matrix, if element value median corresponding to element in the first intermediary matrix is default less than second just
Value, then element corresponding in the second intermediary matrix is set to using natural constant taken as the truth of a matter, with corresponding element value median it is negative
For the numerical value of power.
First auxiliary parameter computing module 1335d is corresponded to for the element of the every row of the second intermediary matrix to be added
The addition and value of every row;If the addition and value is equal to 0, calculating corresponding first auxiliary parameter of the row is third preset positive value;If
The addition and value is not equal to 0, then calculates the inverse that corresponding first auxiliary parameter of the row is the addition and value.
Second auxiliary parameter computing module 1335e, for by each element and the second intermediary matrix of the first intermediary matrix
In every row corresponding position element multiplication after sum, to obtain the second auxiliary parameter of corresponding every row.
Median generation module 1335f is assisted, for respectively joining corresponding first auxiliary parameter of every row and the second auxiliary
Number be multiplied after sum, then divided by current candidate family parameter square after, then subtract Positive training sample sum and sample characteristics
The product of the dimension of vector, using as current auxiliary median.
Those of ordinary skill in the art will appreciate that realizing all or part of the process in above-described embodiment method, being can be with
Relevant hardware is instructed to complete by computer program, the program can be stored in a computer-readable storage medium
In, the program is when being executed, it may include such as the process of the embodiment of above-mentioned each method.Wherein, the storage medium can be magnetic
Dish, CD, read-only memory (Read-Only Memory, ROM) or random access memory (Random Access
Memory, RAM) etc..
The embodiments described above only express several embodiments of the present invention, and the description thereof is more specific and detailed, but simultaneously
Limitations on the scope of the patent of the present invention therefore cannot be interpreted as.It should be pointed out that for those of ordinary skill in the art
For, without departing from the inventive concept of the premise, various modifications and improvements can be made, these belong to guarantor of the invention
Protect range.Therefore, the scope of protection of the patent of the invention shall be subject to the appended claims.
Claims (12)
1. a kind of User Status list disaggregated model training method, which comprises
Belong at least two Positive training samples of designated user's state class known to acquisition;Each Positive training sample has at least two
Customer attribute information;
According to every customer attribute information of each Positive training sample, the sampling feature vectors of each Positive training sample are extracted;
All Positive training samples are divided into first kind Positive training sample and the second class Positive training sample;
Candidate parameter value is taken in the value range of model parameter;
Candidate user state list disaggregated model is generated respectively according to first kind Positive training sample and each candidate parameter value, to
Two class Positive training samples classify and statistical classification accuracy rate;
Using the corresponding highest candidate parameter value of classification accuracy as the model parameter estimated;
According to the model parameter generating probability density function model estimated;
Generate User Status list disaggregated model, the User Status list disaggregated model include feature vector for receiving input simultaneously
The probability density estimation for calculating functional value further includes indicating whether to belong to for being calculated according to calculated functional value
The classification decision model of the classification results of designated user's state class.
2. the method according to claim 1, wherein the method also includes:
Obtain at least two customer attribute informations corresponding to user identifier to be detected;
Feature vector to be detected is extracted according to the customer attribute information got;
The feature vector to be detected is inputted into the User Status list disaggregated model, output indicates whether to belong to designated user's shape
The classification results of state class, with User Status corresponding to the determination user identifier to be detected.
3. the method according to claim 1, wherein described take candidate parameter in the value range of model parameter
Value, comprising:
The candidate parameter value of preset quantity is taken in the value range of model parameter.
4. the method according to claim 1, which is characterized in that the probability density estimation is
Using natural constant as the truth of a matter, respectively using the function between each sampling feature vectors and the feature vector of input as the index of index
The sum of function;Function between each sampling feature vectors and the feature vector of input is respectively the feature vector inputted and corresponding
Sampling feature vectors difference transposition take it is negative, multiplied by the negative secondary power of model parameter, multiplied by input feature vector with
The difference of corresponding sampling feature vectors.
5. according to the method described in claim 3, it is characterized in that, the probability density estimation is using natural constant the bottom of as
It counts, respectively using the function between each sampling feature vectors and the feature vector of input as the sum of the exponential function of index;Each
Function between sampling feature vectors and the feature vector of input be respectively the feature vector that inputs with corresponding sample characteristics to
The transposition of the difference of amount takes feature vector and corresponding sample spy negative, multiplied by the negative secondary power of model parameter, multiplied by input
Levy the difference of vector;The step of all Positive training samples are divided into first kind Positive training sample and the second class Positive training sample it
Before, the method also includes:
Calculate the Mean Matrix of the sampling feature vectors of all Positive training samples;
Variance matrix is calculated according to the Mean Matrix;
The variance matrix is substituted into model parameter function to obtain the value range of model parameter;The model parameter function is
The inverse of the dimension of sampling feature vectors after the mark of variance matrix multiplied by extracting square root, multiplied by using Positive training sample sum as the truth of a matter
And the exponential function born as index is taken with the quotient of parametric variable and the dimension;The parametric variable has default value range.
6. a kind of User Status list disaggregated model training device, which is characterized in that described device includes:
Positive training sample obtains module, for obtaining known at least two Positive training samples for belonging to designated user's state class;Often
A Positive training sample has at least two customer attribute informations;
Sampling feature vectors extraction module extracts each just for every customer attribute information according to each Positive training sample
The sampling feature vectors of training sample;
Sample division module just trains sample for all Positive training samples to be divided into first kind Positive training sample and the second class
This;
Candidate parameter value chooses module, for taking candidate parameter value in the value range of model parameter;
Statistic of classification module, for generating candidate user state respectively according to first kind Positive training sample and each candidate parameter value
Single disaggregated model, classify and statistical classification accuracy rate to the second class Positive training sample;
Model parameter determining module, for the highest candidate parameter value of classification accuracy will to be corresponded to as the model ginseng estimated
Number;
Training execution module, for generating User Status list disaggregated model, the User Status list disaggregated model includes for connecing
It receives the feature vector of input and calculates the probability density estimation of functional value, further include for according to calculated functional value
Calculate the classification decision model for indicating whether to belong to the classification results of designated user's state class.
7. device according to claim 6, which is characterized in that described device further include:
Customer attribute information obtains module, for obtaining at least two customer attribute informations corresponding to user identifier to be detected;
Characteristic vector pickup module to be detected, for extracting feature vector to be detected according to the customer attribute information got;
Categorization module, for the feature vector to be detected to be inputted the User Status list disaggregated model, output is indicated whether
Belong to the classification results of designated user's state class, with User Status corresponding to the determination user identifier to be detected.
8. device according to claim 6, which is characterized in that the candidate parameter value chooses module and is also used to join in model
The candidate parameter value of preset quantity is taken in several value ranges.
9. the device according to any one of claim 6 to 8, which is characterized in that the probability density estimation is
Using natural constant as the truth of a matter, respectively using the function between each sampling feature vectors and the feature vector of input as the index of index
The sum of function;Function between each sampling feature vectors and the feature vector of input is respectively the feature vector inputted and corresponding
Sampling feature vectors difference transposition take it is negative, multiplied by the negative secondary power of model parameter, multiplied by input feature vector with
The difference of corresponding sampling feature vectors.
10. device according to claim 8, which is characterized in that the probability density estimation is to be with natural constant
The truth of a matter, respectively using the function between each sampling feature vectors and the feature vector of input as the sum of the exponential function of index;Often
Function between a sampling feature vectors and the feature vector of input is respectively the feature vector that inputs and corresponding sample characteristics
The transposition of the difference of vector take it is negative, multiplied by the negative secondary power of model parameter, multiplied by feature vector and the corresponding sample of input
The difference of feature vector;Described device further include:
Mean Matrix computing module, for all Positive training samples being divided into first kind Positive training sample and the second class is just being instructed
Before practicing sample, the Mean Matrix of the sampling feature vectors of all Positive training samples is calculated;
Variance matrix computing module, for calculating variance matrix according to the Mean Matrix;
Model parameter value range computing module, for the variance matrix to be substituted into model parameter function to obtain model parameter
Value range;The model parameter function is that the inverse of the dimension of sampling feature vectors is flat multiplied by opening after the mark of variance matrix
Side takes the index letter born as index multiplied by by the truth of a matter of Positive training sample sum and with the quotient of parametric variable and the dimension
Number;The parametric variable has default value range.
11. a kind of electronic equipment, including memory and processor, computer program, the calculating are stored in the memory
When machine program is executed by the processor, so that the processor executes the step such as any one of claims 1 to 5 the method
Suddenly.
12. a kind of computer readable storage medium is stored with computer program, when the computer program is executed by processor,
So that the processor is executed such as the step of any one of claims 1 to 5 the method.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201510006021.9A CN104537252B (en) | 2015-01-05 | 2015-01-05 | User Status list disaggregated model training method and device |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201510006021.9A CN104537252B (en) | 2015-01-05 | 2015-01-05 | User Status list disaggregated model training method and device |
Publications (2)
Publication Number | Publication Date |
---|---|
CN104537252A CN104537252A (en) | 2015-04-22 |
CN104537252B true CN104537252B (en) | 2019-09-17 |
Family
ID=52852778
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201510006021.9A Active CN104537252B (en) | 2015-01-05 | 2015-01-05 | User Status list disaggregated model training method and device |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN104537252B (en) |
Families Citing this family (12)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106529110A (en) * | 2015-09-09 | 2017-03-22 | 阿里巴巴集团控股有限公司 | Classification method and equipment of user data |
CN106934413B (en) * | 2015-12-31 | 2020-10-13 | 阿里巴巴集团控股有限公司 | Model training method, device and system and sample set optimization method and device |
CN105812174A (en) * | 2016-03-06 | 2016-07-27 | 刘健文 | Network data determining model training method and apparatus |
CN108021998B (en) * | 2016-10-31 | 2020-10-16 | 腾讯科技(深圳)有限公司 | Method and device for predicting answer duration of network questionnaire |
CN108388563B (en) * | 2017-02-03 | 2022-11-08 | 北京京东尚科信息技术有限公司 | Information output method and device |
WO2019232723A1 (en) * | 2018-06-06 | 2019-12-12 | Beijing Didi Infinity Technology And Development Co., Ltd. | Systems and methods for cleaning data |
CN109167816B (en) * | 2018-08-03 | 2021-11-16 | 广州虎牙信息科技有限公司 | Information pushing method, device, equipment and storage medium |
CN109388674B (en) * | 2018-08-31 | 2022-11-15 | 创新先进技术有限公司 | Data processing method, device, equipment and readable storage medium |
CN110942081B (en) * | 2018-09-25 | 2023-08-18 | 北京嘀嘀无限科技发展有限公司 | Image processing method, device, electronic equipment and readable storage medium |
CN110806733B (en) * | 2019-10-30 | 2021-09-21 | 中国神华能源股份有限公司国华电力分公司 | Thermal power plant equipment monitoring method and device and electronic equipment |
CN111598189B (en) * | 2020-07-20 | 2020-10-30 | 北京瑞莱智慧科技有限公司 | Generative model training method, data generation method, device, medium, and apparatus |
CN113438375B (en) * | 2021-05-24 | 2022-09-27 | 商客通尚景科技(上海)股份有限公司 | Method for maintaining seat state |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102324232A (en) * | 2011-09-12 | 2012-01-18 | 辽宁工业大学 | Method for recognizing sound-groove and system based on gauss hybrid models |
CN103500342A (en) * | 2013-09-18 | 2014-01-08 | 华南理工大学 | Human behavior recognition method based on accelerometer |
CN103530540A (en) * | 2013-09-27 | 2014-01-22 | 西安交通大学 | User identity attribute detection method based on man-machine interaction behavior characteristics |
CN103745201A (en) * | 2014-01-06 | 2014-04-23 | Tcl集团股份有限公司 | Method and device for program recognition |
Family Cites Families (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2009205464A (en) * | 2008-02-28 | 2009-09-10 | Gifu Univ | Medical information processor, medical information processing method, and medical information processing program |
WO2011152072A1 (en) * | 2010-06-04 | 2011-12-08 | パナソニック株式会社 | Content output device, content output method, program, program storage medium and content output integrated circuit |
-
2015
- 2015-01-05 CN CN201510006021.9A patent/CN104537252B/en active Active
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102324232A (en) * | 2011-09-12 | 2012-01-18 | 辽宁工业大学 | Method for recognizing sound-groove and system based on gauss hybrid models |
CN103500342A (en) * | 2013-09-18 | 2014-01-08 | 华南理工大学 | Human behavior recognition method based on accelerometer |
CN103530540A (en) * | 2013-09-27 | 2014-01-22 | 西安交通大学 | User identity attribute detection method based on man-machine interaction behavior characteristics |
CN103745201A (en) * | 2014-01-06 | 2014-04-23 | Tcl集团股份有限公司 | Method and device for program recognition |
Non-Patent Citations (2)
Title |
---|
Extracting the globally and locally adaptive;Zhang X 等;《PLoS ONE》;20141231;1-9 |
根据多维特征的网络用户分类研究;窦伊男;《中国博士学位论文全文数据库信息科技辑》;20101115;全文 |
Also Published As
Publication number | Publication date |
---|---|
CN104537252A (en) | 2015-04-22 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN104537252B (en) | User Status list disaggregated model training method and device | |
CN111797321B (en) | Personalized knowledge recommendation method and system for different scenes | |
CN102902821B (en) | The image high-level semantics mark of much-talked-about topic Network Based, search method and device | |
CN104199826B (en) | A kind of dissimilar medium similarity calculation method and search method based on association analysis | |
CN109145245A (en) | Predict method, apparatus, computer equipment and the storage medium of clicking rate | |
KR102265573B1 (en) | Method and system for reconstructing mathematics learning curriculum based on artificial intelligence | |
CN112307351A (en) | Model training and recommending method, device and equipment for user behavior | |
Liu et al. | Network-based evidential three-way theoretic model for large-scale group decision analysis | |
WO2018068648A1 (en) | Information matching method and related device | |
Doughty et al. | Action modifiers: Learning from adverbs in instructional videos | |
CN110580339B (en) | Method and device for perfecting medical term knowledge base | |
CN113392209A (en) | Text clustering method based on artificial intelligence, related equipment and storage medium | |
CN112258250A (en) | Target user identification method and device based on network hotspot and computer equipment | |
CN110569761B (en) | Method for retrieving remote sensing image by hand-drawn sketch based on counterstudy | |
Cheung et al. | Characterizing user connections in social media through user-shared images | |
CN113537206B (en) | Push data detection method, push data detection device, computer equipment and storage medium | |
CN111339258B (en) | University computer basic exercise recommendation method based on knowledge graph | |
CN113705159A (en) | Merchant name labeling method, device, equipment and storage medium | |
CN110321565B (en) | Real-time text emotion analysis method, device and equipment based on deep learning | |
CN112148994A (en) | Information push effect evaluation method and device, electronic equipment and storage medium | |
CN111782805A (en) | Text label classification method and system | |
Papapanagiotou et al. | Improving concept-based image retrieval with training weights computed from tags | |
CN114048294B (en) | Similar population extension model training method, similar population extension method and device | |
CN104200222B (en) | Object identifying method in a kind of picture based on factor graph model | |
Shin et al. | Active instance selection for few-shot classification |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |