Summary of the invention
Present invention seek to address that the above problem of the prior art.One kind is proposed to allow users to obtain more accurately
Individualized Notification Service, retail shop's positioning big data prediction where improving the user based on inclined label study of the shopping experience of user
Method.Technical scheme is as follows:
A kind of user place retail shop positioning big data prediction technique based on inclined label study comprising following steps:
101. the position behavioral data of couple user carries out including in exceptional sample cleans, missing Wi-Fi information is filled in
Pretreatment operation;
102. being closed according to candidate quotient paved set corresponding to each sample, each sample in data set is some user couple
A kind of shopping status answered, the different shopping status of each user correspond to different candidate quotient paved sets and close, the time of each sample
Select retail shop's set according to certain Rule, for each sample, this rule may be summarized to be three steps: 1, according to distance
Find 10 retail shops nearest from this user's current shopping status;2, according to the convex quadratic programming problem of one innovation of optimization, come
This 10 shops are solved for the importance of this user's current shopping status;3, according to importance, importance is selected to be greater than threshold value
0.4 retail shop closes as candidate quotient paved set, constructs inclined flag data collection;
103. a pair flag data collection partially carries out feature extraction operation, feature group of the Wi-Fi apart from strength characteristic vector is extracted
At feature space, this feature vector is similar to ONE-HOT feature vector, and feature vector is represented as what data set occurred per one-dimensional
Each Wi-Fi under user's current shopping status apart from intensity value;
104. constructing similarity graph according to feature space, specifically include:
For each of data set sample xi, repeatedly do identical operation: 1, by xiAs the one of similarity graph
A node;2, by xiRegard central point as, according to xiWi-Fi is apart from strength characteristic vector between other samples in data set
Euclidean distance is xiThe smallest 10 samples of Euclidean distance are chosen, then according to xiWith this 10 samples of selection, xiIt can see
Work is the central sample point of this 10 samples, and the node that it is corresponding in figure is connected with side in similar diagram;
105. carrying out probability propagation according to similarity graph;For each of data set sample xi, repeatedly do phase
Same operation: 1, it initializes: optimized parameter is calculated according to likelihood function (formula (6)), to calculate xiCorresponding candidate
The probability that the candidate retail shop of each of retail shop's set may interact, using this probability distribution as xiCorresponding candidate quotient paved set is closed
Each of candidate retail shop initialization probability distribution;2, for the t times iteration of probability propagation algorithm: according to based on similar diagram
Formula obtain the t times iteration xiThe probability distribution of corresponding candidate retail shop realizes the probability propagation of the t times iteration, meter
The process for calculating this formula is exactly the process of a probability propagation, this communication process can only be realized in similar diagram corresponding to each edge
Two nodes between propagation, due to may result in not during propagation in xiWhat corresponding candidate quotient paved set was closed
The interaction probability of retail shop is not 0, therefore will be to all retail shops relative to xiInteraction probability carry out disambiguation normalization, a, for non-
The interaction probability for the retail shop that candidate quotient paved set is closed is set to 0;B, the interaction probability for the retail shop closed for candidate quotient paved set carries out maximum
Minimum normalization.
106. pre- from the conjunction of the candidate quotient paved set of inclined flag data collection by the convergent probability of step 105 probability propagation institute
Measure the retail shop that user will have behavior to interact in future.
Further, the step 101 carries out pretreatment operation specific steps to the shopping status data of user are as follows:
The cleaning of 1011. exceptional samples: the cleaning of exceptional sample passes through the longitude and latitude in original data set and currently shopping first
The Wi-Fi strength information of state, according to formula
The abnormal confidence level of each sample is calculated, wherein λi,τiThe longitude of user corresponding to respectively i-th of sample,
The Wi-Fi intensity of latitude and current state, m indicates data set sample size, if the abnormal confidence level c of certain sampleiLower than 0.15
Or be higher than 0.85, then the sample is determined as exceptional sample, and it is filtered away from original data set;
1012. missing Wi-Fi information filling: find first from Wi-Fi strength information missing sample longitude and latitude the most
Similar 10 samples, and this corresponding Wi-Fi strength information of 10 samples is Given information, the phase between two sample
Like property according to formula
It is calculated, wherein λa,λb,User corresponding to the longitude and latitude and sample b of user corresponding to respectively sample a
Longitude and latitude,The respectively variance of longitude and latitude in entire data set, then by this 10 samples according to public affairs
Formula
It goes to fill the Wi-Fi strength information that the sample lacks, wherein sample a is sample to be filled, ai(i=1,2 ...,
10) 10 neighbour's samples for being sample a,For sample aiCorresponding Wi-Fi strength information.
Further, the step 102 candidate quotient paved set according to corresponding to each sample, which is closed, constructs inclined flag data collection
Specific steps are as follows:
For each sample in former data, following operation is repeated to construct inclined flag data collection: (1) according to former number
According to user's longitude and latitude and retail shop's longitude and latitude is concentrated, the distance between sample and each retail shop are calculatedWherein λA,Respectively
Indicate the longitude and latitude of shop A, λa,Respectively indicate sample a longitude and latitude;(2) according to calculated distance d, selection and sample
10 nearest retail shops of distance;(3) longitude and latitude of 10 nearest retail shops of the distance according to corresponding to this sample, to following secondary
Planning equation optimizes:
10 retail shops corresponding to the sample are solved with respect to the weighted value of this sample, wherein λa,Respectively indicate sample a institute
Corresponding user's longitude and latitude, ωa,iThe retail shop i that (i=1,2 ..., 10) is respectively indicated in 10 nearest retail shops of distance sample a is opposite
In the weighted value of sample a,The longitude and latitude of 10 retail shops nearest corresponding to sample a is respectively indicated,
If weight corresponding to the retail shop calculated is greater than 0.4, which is added in the candidate quotient paved set conjunction of the sample.
Further, the step 103 carries out feature extraction operation to inclined flag data collection, specifically includes step:
Wi-Fi is apart from intensity: turning to 1000 dimensional feature vectors for Wi-Fi title is discrete first, characteristic value is Wi-Fi
Corresponding Wi-Fi intensity, then according to conversion formula:
The Wi-Fi strength characteristic vector of discretization is converted in order to which Wi-Fi is apart from strength characteristic vector, whereinIt is i-th
1000 dimension Wi-Fi of a sample apart from strength characteristic vector,Wi- corresponding to 1000 dimensions Wi-Fi for i-th of sample
Fi strength characteristic vector, | Yi| it is the size that the corresponding candidate quotient paved set of i-th of sample is closed,Respectively
Indicate the corresponding candidate retail shop A of the samplejLongitude and latitude, λa,It respectively indicates the sample and corresponds to user's longitude and latitude.
Further, the step 104 constructs the specific steps of similarity graph according to feature space are as follows:
In order to construct similarity graph < V based on feature space, E, ωe>, need to define node V, the phase of similar diagram respectively
Like the while weights omega in E and similar diagram of figuree;
The definition of the node of 1041. similar diagrams: each of inclined flag data collection sample is considered as in similarity graph
Node;
The definition on the side of 1042. similar diagrams: for every in each of inclined flag data collection sample, that is, similarity graph
One node, 10 samples in addition to itself for selecting Wi-Fi therewith nearest apart from intensity Euclidean distance as affiliated partner,
Two o'clock corresponding in similar diagram is attached, the side as similar diagram;
The definition of the side right weight of 1043. similar diagrams: the side according to the similar (a, b) in formula (2) as similar diagram
The weight of (a, b), wherein a, b are respectively two samples that two nodes are corresponding in inclined flag data concentration in similar diagram.
Further, the step 105 carries out probability propagation, specific steps according to similarity graph are as follows:
1051. initialization probabilities: for each sample, assume initially that retail shop appears in the probability in the conjunction of its candidate quotient paved set
For the ratio that the retail shop in entire data set occurs, i.e., the probability that retail shop occurs in data set is appeared in into the sample as retail shop
This candidate quotient paved set closes the priori knowledge of probability, and assumes to wait under conditions of the Wi-Fi of i-th of sample occurs apart from intensity
Retail shop in selected works is the distribution of this base of a fruit of the probability logic of obligation of true tag, then according to existing inclined flag data collection, construction
Likelihood function out:
Wherein p (y ∈ Si|xi, θ) and it is the true tag under conditions of Wi-Fi of i-th of sample occurs apart from intensity vector
The probability being present in the candidate quotient paved set conjunction of the sample, nyIndicate the number that retail shop y occurs in entire data set, πi,yFor
Retail shop y appear in its candidate quotient paved set close in probability, p (y | xi, θ) and it is to go out in the Wi-Fi of i-th of sample apart from intensity vector
Under conditions of existing, retail shop y is the probability of true tag, this likelihood function has formalized each of entire data set sample
True tag be all present in candidate quotient paved set and close this known facts, and parameter value θ can be carried out with Maximum-likelihood estimation
It estimates, whereinAs in the Wi-Fi of this sample apart from strength characteristic vector
Under conditions of appearance, retail shop y it is following will the user corresponding to this sample the probability for interacting behavior, will be as general
The initialization probability that rate is propagated;
The propagation of 1052. probability: in the t wheel iteration of probability propagation, according to the probability matrix F of last round of iterationt-1With
Initialization probability matrix P=[p (yi=j | xi,θ)]m×q, the new probability by field sample propagation effect of a wheel can be obtained
Matrix Ft:
Wherein W ∈ Rm×mSimilarity matrix between sample and sample, probability propagation have the wheel of iteration 50 altogether, pass in probability
In each round broadcast, the interaction probability of retail shop corresponding to each sample is broadcast to corresponding to it according to the similarity between sample
Neighbour's example, each sample interact probability according to retail shop corresponding to its 10 neighbour's samples to update oneself to the mutual of this retail shop
Dynamic probability.
Further, in label problem concerning study partially, each round iteration needs to disambiguate updated probability matrix
Retail shop's interaction probability in each sample non-candidate retail shop set is set to 0 by operation, mutual to the retail shop in the conjunction of candidate quotient paved set
Dynamic probability is normalized:
Further, the step 106 is closed by propagating a convergent probability from the candidate quotient paved set of inclined flag data collection
In predict user have in future behavior interact retail shop specific steps are as follows:
The probability matrix F obtained according to convergence is propagated in 105 stepst, can be obtained each sample and correspond to user most has
The prediction retail shop that may be interacted:
It advantages of the present invention and has the beneficial effect that:
1, retail shop's positioning applies itself, and most common prediction technique is basic more sorting machine learning methods, and more points
Class method can consume a large amount of resource, and the possible label of each sample should be the subset of all labels, i.e., each sample
True tag possibly only be present in certain several label, rather than more classification methods all labels are regarded as it is possible true
Label, the precision that will lead to more classification methods in this way is insufficient.Therefore this patent has innovatively regarded retail shop's positioning application partially as
Label learning method predicted, can make full use of the label information of that several retail shop of the only possible interaction of each sample into
Row prediction, greatly improves the precision of model;
2, in exceptional sample cleaning step, it is contemplated that the fact that sample standard deviation in data set is in the same commercial circle,
This patent innovatively creates relevant different to the Wi-Fi intensity of the longitude and latitude of user corresponding to sample and current shopping status
Normal confidence level will deviate from the sample that the average confidence level in data set is too high or itself confidence level is too low and clear out.
3, more similar with according to corresponding user's longitude and latitude between different samples in retail shop's positioning application, the shopping locating for them
State just should more similar principle, this patent innovatively creates the similarity formula based on this principle, to indicate different
Similarity degree between sample, there are two effects in this patent for this similarity: (1) according to the sample with Wi-Fi loss of learning
10 minimum samples of similarity remove the missing information of the sample of filling Wi-Fi loss of learning;(2) similarity can be used as phase
Like in figure, side right is great small between sample.
4, during constructing inclined flag data collection, conventional building method corresponds to user only by the sample is found
10 nearest retail shops of distance, however excessive noise figure can be brought to inclined flag data collection in this way, therefore we also need
10 retail shops for adjusting the distance nearest are screened, this patent innovatively create with corresponding to retail shop's longitude and latitude and sample
The relevant quadratic programming equation of the longitude and latitude of user, interaction weight of this quadratic programming equation by each retail shop relative to the sample
As variable is solved, according to optimal solution variable corresponding to optimization quadratic programming equation, it will be able to filter out as much as possible
User's current shopping status relative distance (relative to other 9 shops) immediate retail shop, can substantially reduce inclined reference numerals
According to the too big brought noise figure of the candidate tally set size of collection.
5, during feature extraction operation, this patent caught in retail shop's positioning application user corresponding to each sample with
Candidate quotient paved set close in each retail shop's distance average value, can by each sample candidate retail shop and non-candidate retail shop very
The characteristic distinguished well, and simultaneously in view of average distance can not distinguish asking for the retail shop in the conjunction of candidate quotient paved set well
Topic, Wi-Fi intensity corresponding to each sample is combined with average distance, innovatively proposes Wi-Fi apart from intensity
Vector characteristics ensure that the area between the retail shop in the conjunction of candidate quotient paved set while distinguishing candidate retail shop and non-candidate retail shop
Indexing.
6, during probability propagation, classical label propagation algorithm is transformed by this patent.Classical label passes
It broadcasts algorithm to only account for the appearance of candidate retail shop and do not occur this case shell, and does not consider the potential general of candidate quotient paved set conjunction
Rate distribution, therefore classical label propagation algorithm is unable to reach satisfactory expressive force, this patent is utilized label and propagates calculation
The framework of method, the probability propagation algorithm that this patent proposes on this basis are estimated according to the maximum likelihood that this base of a fruit of logic-based is distributed
Meter goes to excavate the probability distribution of the conjunction of candidate quotient paved set corresponding to each sample, is then put into the probability distribution that estimation obtains
In the frame of label propagation algorithm, and innovatively propose disambiguate normalization (a, for non-candidate retail shop set retail shop it is mutual
Dynamic probability is set to 0;B, the interaction probability for the retail shop closed for candidate quotient paved set carries out minimax normalization) it optimizes and is propagated through
Not the problem of Cheng Zhong non-candidate retail shop probability is not 0.Essentially, probability propagation algorithm solve label propagation algorithm can only be in number
The shortcomings that carrying out data mining according to surface layer substantially increases the prediction result of label study partially.
Specific embodiment
Following will be combined with the drawings in the embodiments of the present invention, and technical solution in the embodiment of the present invention carries out clear, detailed
Carefully describe.Described embodiment is only a part of the embodiments of the present invention.
The technical solution that the present invention solves above-mentioned technical problem is:
With reference to Fig. 1, Fig. 1 is that retail shop's positioning where the embodiment of the present invention one provides a kind of user based on inclined label study is big
The flow chart of data predication method, specifically includes:
101. the shopping status data of couple user carry out pretreatment operation, specific as follows: 1011. exceptional samples cleaning: different
The cleaning of normal sample passes through the Wi-Fi intensity letter of the longitude and latitude of user corresponding to sample in original data set and current state first
Breath calculates the abnormal confidence level of each sample according to formula (1), if the abnormal confidence level c of certain sampleiLower than 0.15 or it is higher than
0.85, then the sample is determined as exceptional sample by us, and it is filtered away from original data set;1012. missing Wi-Fi letters
The filling of breath: due to the factor of force majeure, the Wi-Fi strength information of certain samples can not be obtained accurately, according to longitude and latitude
Similar sample, Wi-Fi strength information also answer similar thought, and the sample institute found first from Wi-Fi strength information missing is right
Using the most similar 10 samples of family longitude and latitude, and this corresponding Wi-Fi strength information of 10 samples is Given information,
Similitude between two samples is calculated according to formula (2), then goes filling should according to formula (3) by this 10 samples
The Wi-Fi strength information of sample missing.
Inclined flag data collection is constructed 102. closing according to candidate quotient paved set corresponding to each user, it is specific as follows: for original
Each sample in data repeats following operation to construct inclined flag data collection: (1) according to user's longitude and latitude in original data set
Degree and retail shop's longitude and latitude calculate the distance between sample and each retail shop(wherein λA,Respectively
Indicate the longitude and latitude of shop A, λa,Respectively indicate user a longitude and latitude);(2) with according to calculated distance d, selection and sample
10 nearest retail shops of distance;(3) longitude and latitude of 10 nearest retail shops of the distance according to corresponding to this sample, to quadratic programming
Equation (formula (4)) optimizes, and weighted value of 10 retail shops corresponding to the sample with respect to this sample is solved, if calculating
Retail shop corresponding to weight be greater than 0.4, then by the retail shop be added to the sample candidate quotient paved set close in.
103. carrying out feature extraction operation, tool to inclined flag data collection according to user's longitude and latitude and Wi-Fi strength information
Body is as follows: turning to 1000 dimensional feature vectors for Wi-Fi title is discrete first, characteristic value is Wi-Fi corresponding to Wi-Fi strong
Degree, then according to conversion formula (5) by the conversion of the Wi-Fi strength characteristic vector of discretization in order to Wi-Fi apart from strength characteristic to
Amount.
104. according to feature space construct similarity graph, it is specific as follows: in order to construct the similarity graph based on feature space <
V,E,ωe> (see Fig. 2) needs to define the node V of similar diagram, the while weights omega in E and similar diagram of similar diagram respectivelye。
The definition of the node of 1041. similar diagrams: each of inclined flag data collection sample is considered as in similarity graph
Node.
The definition on the side of 1042. similar diagrams: (each in similarity graph for each of inclined flag data collection sample
A node), select 10 samples (node) in addition to itself that Wi-Fi is nearest apart from intensity Euclidean distance therewith as association
Two o'clock corresponding in similar diagram is attached, the side as similar diagram by object.
The definition of the side right weight of 1043. similar diagrams: the side according to the similar (a, b) in formula (2) as similar diagram
The weight of (a, b), wherein a, b are respectively two samples that two nodes are corresponding in inclined flag data concentration in similar diagram.
105. probability propagation is carried out according to similarity graph, specific as follows:
1051. initialization probabilities: for each sample, assume initially that retail shop appears in the probability in the conjunction of its candidate quotient paved set
For the ratio that the retail shop in entire data set occurs, i.e., the probability that retail shop occurs in data set is appeared in into the sample as retail shop
This candidate quotient paved set closes the priori knowledge of probability, and it is further assumed that the condition occurred in the Wi-Fi of i-th of sample apart from intensity
Under, retail shop in Candidate Set is the distribution of this base of a fruit of the probability logic of obligation of true tag, then according to existing inclined flag data collection,
Having constructed likelihood function is formula (6), and a likelihood function has formalized the true mark of each of entire data set sample
Label are all present in candidate quotient paved set and close this known facts.And parameter value θ can be estimated with Maximum-likelihood estimation, whereinThe item as occurred in the Wi-Fi of this sample apart from strength characteristic vector
Under part, retail shop y it is following will the user corresponding to this sample the probability for interacting behavior, will be as probability propagation
Initialization probability.
The propagation of 1052. probability: in the t wheel iteration of probability propagation, according to the probability matrix F of last round of iterationt-1With
Initialization probability matrix P=[p (yi=j | xi,θ)]m×q, the new probability by field sample propagation effect of a wheel can be obtained
Matrix FtFor formula (7), probability propagation has the wheel of iteration 50 altogether.In each round of probability propagation, quotient corresponding to each sample
Paving interaction probability is broadcast to neighbour's example corresponding to it according to the similarity between sample, and each sample is according to its 10 neighbours
Retail shop corresponding to sample interacts probability to update oneself interaction probability to this retail shop.It is each in label problem concerning study partially
Wheel iteration needs to carry out updated probability matrix disambiguation operation, i.e., the retail shop in each sample non-candidate retail shop set is mutual
Dynamic probability is set to 0, and operation is normalized as shown in formula (8) to retail shop's interaction probability in the conjunction of candidate quotient paved set.
106. the convergent probability by propagating predicts user's future from the conjunction of the candidate quotient paved set of inclined flag data collection
The retail shop for thering is behavior to interact, it is specific as follows: to restrain obtained probability matrix F according to propagating in 105 stepst, can be obtained every
It is shown in formula (9) that a sample, which corresponds to the prediction retail shop that user most possibly interacts,.Made based on the probability propagation method marked partially
User can obtain more accurately Individualized Notification Service, improve the shopping experience of user, becoming solution, nowadays label obtains
Take the approach that can be effectively predicted under conditions of difficulty.Inclined label learning model based on big data retail shop where user positions
The general frame figure of practical application see Fig. 3.
The above embodiment is interpreted as being merely to illustrate the present invention rather than limit the scope of the invention.?
After the content for having read record of the invention, technical staff can be made various changes or modifications the present invention, these equivalent changes
Change and modification equally falls into the scope of the claims in the present invention.