CN110060102A

CN110060102A - Retail shop where user based on inclined label study positions big data prediction technique

Info

Publication number: CN110060102A
Application number: CN201910313789.9A
Authority: CN
Inventors: 王进; 闵子剑; 孙开伟; 许景益; 邓欣; 刘彬
Original assignee: Chongqing University of Post and Telecommunications
Current assignee: Beijing Xinsuo Consulting Co.,Ltd.
Priority date: 2019-04-18
Filing date: 2019-04-18
Publication date: 2019-07-26
Anticipated expiration: 2039-04-18
Also published as: CN110060102B

Abstract

A kind of user place retail shop positioning big data prediction technique based on inclined label study is claimed in the present invention, comprising: the shopping status data of 101 couples of users carry out pretreatment operation；The 102 candidate quotient paved sets according to corresponding to each sample, which are closed, constructs inclined flag data collection；103 pairs of inclined flag data collection carry out feature extraction operation；104 construct similarity graph according to feature space；105 carry out probability propagation according to similarity graph；106 by propagating a convergent probability, and the retail shop that user will have behavior to interact in future is predicted from the conjunction of the candidate quotient paved set of inclined flag data collection.The present invention is mainly by pre-processing user's history data, extract feature, convert inclined flag data collection, establish label learning model partially, according to the inclined flag data collection of the position behavior of user, the retail shop that user will have behavior to interact in future is predicted from the conjunction of candidate quotient paved set corresponding to each user, is allowed users to obtain more accurately Individualized Notification Service, is improved the shopping experience of user.

Description

Retail shop where user based on inclined label study positions big data prediction technique

Technical field

The invention belongs to mark study, big data processing technology field partially, it is particularly based on where probability propagation model user Retail shop positions big data prediction.

Background technique

Label study is a kind of output space Weakly supervised study associated with one group of candidate's tag set, candidate label partially Only having one in set is authentic signature, and remaining label is considered as interference noise label.During inclined label training, each The true tag of training sample is submerged in candidate tag set, thus can not be similar to strong supervised learning it is such, directly from The input space is obtained in data set to the learning algorithm for exporting space.However, in real life, believing with accurate unique tags The data set of breath is more difficult to more obtain.Therefore we have in face of how from do not have unicity and definition data set middle school The serious problem of habit.Recently, it marks study to provide many effective methods partially to go to solve problems, and widely uses In many practical applications, especially there is very big breakthrough where user in retail shop's orientation problem.

With the rapid proliferation of internet mobile payment, we enjoy life brought by more and more intelligent positionings just Benefit.Such as when client enters into certain dining room in market, the discount coupon in the mobile phone meeting automatic spring dining room；When client enters into market When clothes shop, mobile phone can automatically recommend the clothes that you like in this family shop；When client passes by one jeweler's shop in market, mobile phone It can think that a diamond ring for a long time is available in stock with automatic prompt client；When leaving market parking lot, mobile phone is permitted client's Can under can hand over parking fee automatically.The intimate service that these clients are enjoyed all be unable to do without the excavation of behind big data and machine learning Support.Retail shop's positioning analysis where client is while implicitly bringing the artificial intelligent body of client and testing, so that user is easier Retail shop's information interested to oneself is understood, to improve the purchasing power of client indirectly.How in correct time, correct Place give user most effective service, be big data era intelligence expand new challenge.

Summary of the invention

Present invention seek to address that the above problem of the prior art.One kind is proposed to allow users to obtain more accurately Individualized Notification Service, retail shop's positioning big data prediction where improving the user based on inclined label study of the shopping experience of user Method.Technical scheme is as follows:

A kind of user place retail shop positioning big data prediction technique based on inclined label study comprising following steps:

101. the position behavioral data of couple user carries out including in exceptional sample cleans, missing Wi-Fi information is filled in Pretreatment operation；

102. being closed according to candidate quotient paved set corresponding to each sample, each sample in data set is some user couple A kind of shopping status answered, the different shopping status of each user correspond to different candidate quotient paved sets and close, the time of each sample Select retail shop's set according to certain Rule, for each sample, this rule may be summarized to be three steps: 1, according to distance Find 10 retail shops nearest from this user's current shopping status；2, according to the convex quadratic programming problem of one innovation of optimization, come This 10 shops are solved for the importance of this user's current shopping status；3, according to importance, importance is selected to be greater than threshold value 0.4 retail shop closes as candidate quotient paved set, constructs inclined flag data collection；

103. a pair flag data collection partially carries out feature extraction operation, feature group of the Wi-Fi apart from strength characteristic vector is extracted At feature space, this feature vector is similar to ONE-HOT feature vector, and feature vector is represented as what data set occurred per one-dimensional Each Wi-Fi under user's current shopping status apart from intensity value；

104. constructing similarity graph according to feature space, specifically include:

For each of data set sample x_i, repeatedly do identical operation: 1, by x_iAs the one of similarity graph A node；2, by x_iRegard central point as, according to x_iWi-Fi is apart from strength characteristic vector between other samples in data set Euclidean distance is x_iThe smallest 10 samples of Euclidean distance are chosen, then according to x_iWith this 10 samples of selection, x_iIt can see Work is the central sample point of this 10 samples, and the node that it is corresponding in figure is connected with side in similar diagram；

105. carrying out probability propagation according to similarity graph；For each of data set sample x_i, repeatedly do phase Same operation: 1, it initializes: optimized parameter is calculated according to likelihood function (formula (6)), to calculate x_iCorresponding candidate The probability that the candidate retail shop of each of retail shop's set may interact, using this probability distribution as x_iCorresponding candidate quotient paved set is closed Each of candidate retail shop initialization probability distribution；2, for the t times iteration of probability propagation algorithm: according to based on similar diagram Formula obtain the t times iteration x_iThe probability distribution of corresponding candidate retail shop realizes the probability propagation of the t times iteration, meter The process for calculating this formula is exactly the process of a probability propagation, this communication process can only be realized in similar diagram corresponding to each edge Two nodes between propagation, due to may result in not during propagation in x_iWhat corresponding candidate quotient paved set was closed The interaction probability of retail shop is not 0, therefore will be to all retail shops relative to x_iInteraction probability carry out disambiguation normalization, a, for non- The interaction probability for the retail shop that candidate quotient paved set is closed is set to 0；B, the interaction probability for the retail shop closed for candidate quotient paved set carries out maximum Minimum normalization.

106. pre- from the conjunction of the candidate quotient paved set of inclined flag data collection by the convergent probability of step 105 probability propagation institute Measure the retail shop that user will have behavior to interact in future.

Further, the step 101 carries out pretreatment operation specific steps to the shopping status data of user are as follows:

The cleaning of 1011. exceptional samples: the cleaning of exceptional sample passes through the longitude and latitude in original data set and currently shopping first The Wi-Fi strength information of state, according to formula

The abnormal confidence level of each sample is calculated, wherein λ_i,τ_iThe longitude of user corresponding to respectively i-th of sample, The Wi-Fi intensity of latitude and current state, m indicates data set sample size, if the abnormal confidence level c of certain sample_iLower than 0.15 Or be higher than 0.85, then the sample is determined as exceptional sample, and it is filtered away from original data set；

1012. missing Wi-Fi information filling: find first from Wi-Fi strength information missing sample longitude and latitude the most Similar 10 samples, and this corresponding Wi-Fi strength information of 10 samples is Given information, the phase between two sample Like property according to formula

It is calculated, wherein λ_a,λ_b,User corresponding to the longitude and latitude and sample b of user corresponding to respectively sample a Longitude and latitude,The respectively variance of longitude and latitude in entire data set, then by this 10 samples according to public affairs Formula

It goes to fill the Wi-Fi strength information that the sample lacks, wherein sample a is sample to be filled, a_i(i=1,2 ..., 10) 10 neighbour's samples for being sample a,For sample a_iCorresponding Wi-Fi strength information.

Further, the step 102 candidate quotient paved set according to corresponding to each sample, which is closed, constructs inclined flag data collection Specific steps are as follows:

For each sample in former data, following operation is repeated to construct inclined flag data collection: (1) according to former number According to user's longitude and latitude and retail shop's longitude and latitude is concentrated, the distance between sample and each retail shop are calculatedWherein λ_A,Respectively Indicate the longitude and latitude of shop A, λ_a,Respectively indicate sample a longitude and latitude；(2) according to calculated distance d, selection and sample 10 nearest retail shops of distance；(3) longitude and latitude of 10 nearest retail shops of the distance according to corresponding to this sample, to following secondary Planning equation optimizes:

10 retail shops corresponding to the sample are solved with respect to the weighted value of this sample, wherein λ_a,Respectively indicate sample a institute Corresponding user's longitude and latitude, ω_a,iThe retail shop i that (i=1,2 ..., 10) is respectively indicated in 10 nearest retail shops of distance sample a is opposite In the weighted value of sample a,The longitude and latitude of 10 retail shops nearest corresponding to sample a is respectively indicated, If weight corresponding to the retail shop calculated is greater than 0.4, which is added in the candidate quotient paved set conjunction of the sample.

Further, the step 103 carries out feature extraction operation to inclined flag data collection, specifically includes step:

Wi-Fi is apart from intensity: turning to 1000 dimensional feature vectors for Wi-Fi title is discrete first, characteristic value is Wi-Fi Corresponding Wi-Fi intensity, then according to conversion formula:

The Wi-Fi strength characteristic vector of discretization is converted in order to which Wi-Fi is apart from strength characteristic vector, whereinIt is i-th 1000 dimension Wi-Fi of a sample apart from strength characteristic vector,Wi- corresponding to 1000 dimensions Wi-Fi for i-th of sample Fi strength characteristic vector, | Y_i| it is the size that the corresponding candidate quotient paved set of i-th of sample is closed,Respectively Indicate the corresponding candidate retail shop A of the sample_jLongitude and latitude, λ_a,It respectively indicates the sample and corresponds to user's longitude and latitude.

Further, the step 104 constructs the specific steps of similarity graph according to feature space are as follows:

In order to construct similarity graph < V based on feature space, E, ω_e>, need to define node V, the phase of similar diagram respectively Like the while weights omega in E and similar diagram of figure_e；

The definition of the node of 1041. similar diagrams: each of inclined flag data collection sample is considered as in similarity graph Node；

The definition on the side of 1042. similar diagrams: for every in each of inclined flag data collection sample, that is, similarity graph One node, 10 samples in addition to itself for selecting Wi-Fi therewith nearest apart from intensity Euclidean distance as affiliated partner, Two o'clock corresponding in similar diagram is attached, the side as similar diagram；

The definition of the side right weight of 1043. similar diagrams: the side according to the similar (a, b) in formula (2) as similar diagram The weight of (a, b), wherein a, b are respectively two samples that two nodes are corresponding in inclined flag data concentration in similar diagram.

Further, the step 105 carries out probability propagation, specific steps according to similarity graph are as follows:

1051. initialization probabilities: for each sample, assume initially that retail shop appears in the probability in the conjunction of its candidate quotient paved set For the ratio that the retail shop in entire data set occurs, i.e., the probability that retail shop occurs in data set is appeared in into the sample as retail shop This candidate quotient paved set closes the priori knowledge of probability, and assumes to wait under conditions of the Wi-Fi of i-th of sample occurs apart from intensity Retail shop in selected works is the distribution of this base of a fruit of the probability logic of obligation of true tag, then according to existing inclined flag data collection, construction Likelihood function out:

Wherein p (y ∈ S_i|x_i, θ) and it is the true tag under conditions of Wi-Fi of i-th of sample occurs apart from intensity vector The probability being present in the candidate quotient paved set conjunction of the sample, n_yIndicate the number that retail shop y occurs in entire data set, π_i,yFor Retail shop y appear in its candidate quotient paved set close in probability, p (y | x_i, θ) and it is to go out in the Wi-Fi of i-th of sample apart from intensity vector Under conditions of existing, retail shop y is the probability of true tag, this likelihood function has formalized each of entire data set sample True tag be all present in candidate quotient paved set and close this known facts, and parameter value θ can be carried out with Maximum-likelihood estimation It estimates, whereinAs in the Wi-Fi of this sample apart from strength characteristic vector Under conditions of appearance, retail shop y it is following will the user corresponding to this sample the probability for interacting behavior, will be as general The initialization probability that rate is propagated；

The propagation of 1052. probability: in the t wheel iteration of probability propagation, according to the probability matrix F of last round of iteration_t-1With Initialization probability matrix P=[p (y_i=j | x_i,θ)]_m×q, the new probability by field sample propagation effect of a wheel can be obtained Matrix F_t:

Wherein W ∈ R^m×mSimilarity matrix between sample and sample, probability propagation have the wheel of iteration 50 altogether, pass in probability In each round broadcast, the interaction probability of retail shop corresponding to each sample is broadcast to corresponding to it according to the similarity between sample Neighbour's example, each sample interact probability according to retail shop corresponding to its 10 neighbour's samples to update oneself to the mutual of this retail shop Dynamic probability.

Further, in label problem concerning study partially, each round iteration needs to disambiguate updated probability matrix Retail shop's interaction probability in each sample non-candidate retail shop set is set to 0 by operation, mutual to the retail shop in the conjunction of candidate quotient paved set Dynamic probability is normalized:

Further, the step 106 is closed by propagating a convergent probability from the candidate quotient paved set of inclined flag data collection In predict user have in future behavior interact retail shop specific steps are as follows:

The probability matrix F obtained according to convergence is propagated in 105 steps_t, can be obtained each sample and correspond to user most has The prediction retail shop that may be interacted:

It advantages of the present invention and has the beneficial effect that:

1, retail shop's positioning applies itself, and most common prediction technique is basic more sorting machine learning methods, and more points Class method can consume a large amount of resource, and the possible label of each sample should be the subset of all labels, i.e., each sample True tag possibly only be present in certain several label, rather than more classification methods all labels are regarded as it is possible true Label, the precision that will lead to more classification methods in this way is insufficient.Therefore this patent has innovatively regarded retail shop's positioning application partially as Label learning method predicted, can make full use of the label information of that several retail shop of the only possible interaction of each sample into Row prediction, greatly improves the precision of model；

2, in exceptional sample cleaning step, it is contemplated that the fact that sample standard deviation in data set is in the same commercial circle, This patent innovatively creates relevant different to the Wi-Fi intensity of the longitude and latitude of user corresponding to sample and current shopping status Normal confidence level will deviate from the sample that the average confidence level in data set is too high or itself confidence level is too low and clear out.

3, more similar with according to corresponding user's longitude and latitude between different samples in retail shop's positioning application, the shopping locating for them State just should more similar principle, this patent innovatively creates the similarity formula based on this principle, to indicate different Similarity degree between sample, there are two effects in this patent for this similarity: (1) according to the sample with Wi-Fi loss of learning 10 minimum samples of similarity remove the missing information of the sample of filling Wi-Fi loss of learning；(2) similarity can be used as phase Like in figure, side right is great small between sample.

4, during constructing inclined flag data collection, conventional building method corresponds to user only by the sample is found 10 nearest retail shops of distance, however excessive noise figure can be brought to inclined flag data collection in this way, therefore we also need 10 retail shops for adjusting the distance nearest are screened, this patent innovatively create with corresponding to retail shop's longitude and latitude and sample The relevant quadratic programming equation of the longitude and latitude of user, interaction weight of this quadratic programming equation by each retail shop relative to the sample As variable is solved, according to optimal solution variable corresponding to optimization quadratic programming equation, it will be able to filter out as much as possible User's current shopping status relative distance (relative to other 9 shops) immediate retail shop, can substantially reduce inclined reference numerals According to the too big brought noise figure of the candidate tally set size of collection.

5, during feature extraction operation, this patent caught in retail shop's positioning application user corresponding to each sample with Candidate quotient paved set close in each retail shop's distance average value, can by each sample candidate retail shop and non-candidate retail shop very The characteristic distinguished well, and simultaneously in view of average distance can not distinguish asking for the retail shop in the conjunction of candidate quotient paved set well Topic, Wi-Fi intensity corresponding to each sample is combined with average distance, innovatively proposes Wi-Fi apart from intensity Vector characteristics ensure that the area between the retail shop in the conjunction of candidate quotient paved set while distinguishing candidate retail shop and non-candidate retail shop Indexing.

6, during probability propagation, classical label propagation algorithm is transformed by this patent.Classical label passes It broadcasts algorithm to only account for the appearance of candidate retail shop and do not occur this case shell, and does not consider the potential general of candidate quotient paved set conjunction Rate distribution, therefore classical label propagation algorithm is unable to reach satisfactory expressive force, this patent is utilized label and propagates calculation The framework of method, the probability propagation algorithm that this patent proposes on this basis are estimated according to the maximum likelihood that this base of a fruit of logic-based is distributed Meter goes to excavate the probability distribution of the conjunction of candidate quotient paved set corresponding to each sample, is then put into the probability distribution that estimation obtains In the frame of label propagation algorithm, and innovatively propose disambiguate normalization (a, for non-candidate retail shop set retail shop it is mutual Dynamic probability is set to 0；B, the interaction probability for the retail shop closed for candidate quotient paved set carries out minimax normalization) it optimizes and is propagated through Not the problem of Cheng Zhong non-candidate retail shop probability is not 0.Essentially, probability propagation algorithm solve label propagation algorithm can only be in number The shortcomings that carrying out data mining according to surface layer substantially increases the prediction result of label study partially.

Detailed description of the invention

Fig. 1 is that retail shop's positioning big data where the present invention provides a kind of user based on inclined label study of preferred embodiment is pre- The flow chart of survey method.

It is pre- that retail shop where Fig. 2 provides a kind of user based on inclined label study of preferred embodiment for the present invention positions big data Sample Similarity figure in survey method.

It is pre- that retail shop where Fig. 3 provides a kind of user based on inclined label study of preferred embodiment for the present invention positions big data Learning model practical application general frame figure is marked in survey method partially.

Specific embodiment

Following will be combined with the drawings in the embodiments of the present invention, and technical solution in the embodiment of the present invention carries out clear, detailed Carefully describe.Described embodiment is only a part of the embodiments of the present invention.

The technical solution that the present invention solves above-mentioned technical problem is:

With reference to Fig. 1, Fig. 1 is that retail shop's positioning where the embodiment of the present invention one provides a kind of user based on inclined label study is big The flow chart of data predication method, specifically includes:

101. the shopping status data of couple user carry out pretreatment operation, specific as follows: 1011. exceptional samples cleaning: different The cleaning of normal sample passes through the Wi-Fi intensity letter of the longitude and latitude of user corresponding to sample in original data set and current state first Breath calculates the abnormal confidence level of each sample according to formula (1), if the abnormal confidence level c of certain sample_iLower than 0.15 or it is higher than 0.85, then the sample is determined as exceptional sample by us, and it is filtered away from original data set；1012. missing Wi-Fi letters The filling of breath: due to the factor of force majeure, the Wi-Fi strength information of certain samples can not be obtained accurately, according to longitude and latitude Similar sample, Wi-Fi strength information also answer similar thought, and the sample institute found first from Wi-Fi strength information missing is right Using the most similar 10 samples of family longitude and latitude, and this corresponding Wi-Fi strength information of 10 samples is Given information, Similitude between two samples is calculated according to formula (2), then goes filling should according to formula (3) by this 10 samples The Wi-Fi strength information of sample missing.

Inclined flag data collection is constructed 102. closing according to candidate quotient paved set corresponding to each user, it is specific as follows: for original Each sample in data repeats following operation to construct inclined flag data collection: (1) according to user's longitude and latitude in original data set Degree and retail shop's longitude and latitude calculate the distance between sample and each retail shop(wherein λ_A,Respectively Indicate the longitude and latitude of shop A, λ_a,Respectively indicate user a longitude and latitude)；(2) with according to calculated distance d, selection and sample 10 nearest retail shops of distance；(3) longitude and latitude of 10 nearest retail shops of the distance according to corresponding to this sample, to quadratic programming Equation (formula (4)) optimizes, and weighted value of 10 retail shops corresponding to the sample with respect to this sample is solved, if calculating Retail shop corresponding to weight be greater than 0.4, then by the retail shop be added to the sample candidate quotient paved set close in.

103. carrying out feature extraction operation, tool to inclined flag data collection according to user's longitude and latitude and Wi-Fi strength information Body is as follows: turning to 1000 dimensional feature vectors for Wi-Fi title is discrete first, characteristic value is Wi-Fi corresponding to Wi-Fi strong Degree, then according to conversion formula (5) by the conversion of the Wi-Fi strength characteristic vector of discretization in order to Wi-Fi apart from strength characteristic to Amount.

104. according to feature space construct similarity graph, it is specific as follows: in order to construct the similarity graph based on feature space < V,E,ω_e> (see Fig. 2) needs to define the node V of similar diagram, the while weights omega in E and similar diagram of similar diagram respectively_e。

The definition of the node of 1041. similar diagrams: each of inclined flag data collection sample is considered as in similarity graph Node.

The definition on the side of 1042. similar diagrams: (each in similarity graph for each of inclined flag data collection sample A node), select 10 samples (node) in addition to itself that Wi-Fi is nearest apart from intensity Euclidean distance therewith as association Two o'clock corresponding in similar diagram is attached, the side as similar diagram by object.

105. probability propagation is carried out according to similarity graph, specific as follows:

1051. initialization probabilities: for each sample, assume initially that retail shop appears in the probability in the conjunction of its candidate quotient paved set For the ratio that the retail shop in entire data set occurs, i.e., the probability that retail shop occurs in data set is appeared in into the sample as retail shop This candidate quotient paved set closes the priori knowledge of probability, and it is further assumed that the condition occurred in the Wi-Fi of i-th of sample apart from intensity Under, retail shop in Candidate Set is the distribution of this base of a fruit of the probability logic of obligation of true tag, then according to existing inclined flag data collection, Having constructed likelihood function is formula (6), and a likelihood function has formalized the true mark of each of entire data set sample Label are all present in candidate quotient paved set and close this known facts.And parameter value θ can be estimated with Maximum-likelihood estimation, whereinThe item as occurred in the Wi-Fi of this sample apart from strength characteristic vector Under part, retail shop y it is following will the user corresponding to this sample the probability for interacting behavior, will be as probability propagation Initialization probability.

The propagation of 1052. probability: in the t wheel iteration of probability propagation, according to the probability matrix F of last round of iteration_t-1With Initialization probability matrix P=[p (y_i=j | x_i,θ)]_m×q, the new probability by field sample propagation effect of a wheel can be obtained Matrix F_tFor formula (7), probability propagation has the wheel of iteration 50 altogether.In each round of probability propagation, quotient corresponding to each sample Paving interaction probability is broadcast to neighbour's example corresponding to it according to the similarity between sample, and each sample is according to its 10 neighbours Retail shop corresponding to sample interacts probability to update oneself interaction probability to this retail shop.It is each in label problem concerning study partially Wheel iteration needs to carry out updated probability matrix disambiguation operation, i.e., the retail shop in each sample non-candidate retail shop set is mutual Dynamic probability is set to 0, and operation is normalized as shown in formula (8) to retail shop's interaction probability in the conjunction of candidate quotient paved set.

106. the convergent probability by propagating predicts user's future from the conjunction of the candidate quotient paved set of inclined flag data collection The retail shop for thering is behavior to interact, it is specific as follows: to restrain obtained probability matrix F according to propagating in 105 steps_t, can be obtained every It is shown in formula (9) that a sample, which corresponds to the prediction retail shop that user most possibly interacts,.Made based on the probability propagation method marked partially User can obtain more accurately Individualized Notification Service, improve the shopping experience of user, becoming solution, nowadays label obtains Take the approach that can be effectively predicted under conditions of difficulty.Inclined label learning model based on big data retail shop where user positions The general frame figure of practical application see Fig. 3.

The above embodiment is interpreted as being merely to illustrate the present invention rather than limit the scope of the invention.? After the content for having read record of the invention, technical staff can be made various changes or modifications the present invention, these equivalent changes Change and modification equally falls into the scope of the claims in the present invention.

Claims

1. retail shop where a kind of user based on inclined label study positions big data prediction technique, which is characterized in that including following Step:

101. the position behavioral data of couple user carries out including that exceptional sample cleans, missing Wi-Fi information is filled in interior pre- place Reason operation；

102. being closed according to candidate quotient paved set corresponding to each sample, each sample in data set is that some user is corresponding A kind of shopping status, the different shopping status of each user correspond to different candidate quotient paved sets and close, the candidate quotient of each sample Paved set is closed according to certain Rule, and for each sample, this rule may be summarized to be three steps: 1, being found according to distance 10 retail shops nearest from this user's current shopping status；2, according to the convex quadratic programming problem of one innovation of optimization, to solve Importance of this 10 shops for this user's current shopping status；3, according to importance, importance is selected to be greater than threshold value 0.4 Retail shop closes as candidate quotient paved set, constructs inclined flag data collection；

103. a pair flag data collection partially carries out feature extraction operation, extracts feature of the Wi-Fi apart from strength characteristic vector and form spy Space is levied, this feature vector is similar to ONE-HOT feature vector, and feature vector is represented as each of data set appearance per one-dimensional Kind of Wi-Fi under user's current shopping status apart from intensity value；

For each of data set sample x_i, repeatedly do identical operation: 1, by x_iAs a knot of similarity graph Point；2, by x_iRegard central point as, according to x_iEuclidean of the Wi-Fi apart from strength characteristic vector between other samples in data set Distance is x_iThe smallest 10 samples of Euclidean distance are chosen, then according to x_iWith this 10 samples of selection, x_iIt can be regarded as The central sample point of this 10 samples, the node that it is corresponding in figure is connected with side in similar diagram；

105. carrying out probability propagation according to similarity graph；For each of data set sample x_i, repeatedly it is identical behaviour Make: 1, initializing: optimized parameter is calculated according to likelihood function (formula (6)), to calculate x_iCorresponding candidate quotient paved set The probability that each of conjunction candidate retail shop may interact, using this probability distribution as x_iIt is every in corresponding candidate quotient paved set conjunction The initialization probability distribution of a candidate retail shop；2, for the t times iteration of probability propagation algorithm: according to the formula based on similar diagram Obtain the x of the t times iteration_iThe probability distribution of corresponding candidate retail shop realizes the probability propagation of the t times iteration, calculates this The process of formula is exactly the process of a probability propagation, this communication process can only be realized in similar diagram two corresponding to each edge Propagation between node, due to may result in not during propagation in x_iThe retail shop that corresponding candidate quotient paved set is closed Interacting probability is not 0, therefore will be to all retail shops relative to x_iInteraction probability carry out disambiguation normalization, a, for non-candidate quotient The interaction probability for the retail shop that paved set is closed is set to 0；B, the interaction probability progress minimax for the retail shop closed for candidate quotient paved set is returned One changes.

106. being predicted from the conjunction of the candidate quotient paved set of inclined flag data collection by the convergent probability of step 105 probability propagation institute The retail shop that user will have behavior to interact in future.

2. retail shop where the user according to claim 1 based on inclined label study positions big data prediction technique, special Sign is that the step 101 carries out pretreatment operation specific steps to the shopping status data of user are as follows:

The cleaning of 1011. exceptional samples: the cleaning of exceptional sample passes through the longitude and latitude and current shopping status in original data set first Wi-Fi strength information, according to formula

The abnormal confidence level of each sample is calculated, wherein λ_i,τ_iThe longitude of user corresponding to respectively i-th of sample, latitude and The Wi-Fi intensity of current state, m indicates data set sample size, if the abnormal confidence level c of certain sample_iIt is lower than 0.15 or high In 0.85, then the sample is determined as exceptional sample, and it is filtered away from original data set；

The filling of 1012. missing Wi-Fi information: the sample longitude and latitude found first from Wi-Fi strength information missing is the most similar 10 samples, and this corresponding Wi-Fi strength information of 10 samples is Given information, the similitude between two sample According to formula

It is calculated, wherein λ_a,λ_b,The warp of user corresponding to the longitude and latitude and sample b of user corresponding to respectively sample a Latitude,The respectively variance of longitude and latitude in entire data set, then by this 10 samples according to formula

It goes to fill the Wi-Fi strength information that the sample lacks, wherein sample a is sample to be filled, a_i(i=1,2 ..., 10) it is sample 10 neighbour's samples of this,For sample a_iCorresponding Wi-Fi strength information.

3. retail shop where the user according to claim 2 based on inclined label study positions big data prediction technique, special Sign is that the step 102 candidate quotient paved set according to corresponding to each sample closes the specific steps for constructing inclined flag data collection Are as follows:

For each sample in former data, following operation is repeated to construct inclined flag data collection: (1) according to original data set Middle user's longitude and latitude and retail shop's longitude and latitude calculate the distance between sample and each retail shopWherein λ_A,Respectively Indicate the longitude and latitude of shop A, λ_a,Respectively indicate sample a longitude and latitude；(2) according to calculated distance d, selection and sample 10 nearest retail shops of distance；(3) longitude and latitude of 10 nearest retail shops of the distance according to corresponding to this sample, to following secondary Planning equation optimizes:

10 retail shops corresponding to the sample are solved with respect to the weighted value of this sample, wherein λ_a,It respectively indicates corresponding to sample a User's longitude and latitude, ω_a,i(i=1,2 ..., 10) respectively indicates the retail shop i in 10 nearest retail shops of distance sample a relative to sample The weighted value of this,The longitude and latitude of 10 retail shops nearest corresponding to sample a is respectively indicated, if meter Weight corresponding to the retail shop calculated is greater than 0.4, then the retail shop is added in the candidate quotient paved set conjunction of the sample.

4. retail shop where the user according to claim 3 based on inclined label study positions big data prediction technique, special Sign is that the step 103 carries out feature extraction operation to inclined flag data collection, specifically includes step:

Wi-Fi is apart from intensity: turning to 1000 dimensional feature vectors for Wi-Fi title is discrete first, characteristic value is that Wi-Fi institutes are right The Wi-Fi intensity answered, then according to conversion formula:

The Wi-Fi strength characteristic vector of discretization is converted in order to which Wi-Fi is apart from strength characteristic vector, whereinFor i-th of sample 1000 dimension Wi-Fi apart from strength characteristic vector,Wi-Fi intensity corresponding to 1000 dimensions Wi-Fi for i-th of sample is special Vector is levied, | Y_i| it is the size that the corresponding candidate quotient paved set of i-th of sample is closed,Respectively indicate the sample This correspondence candidate retail shop A_jLongitude and latitude, λ_a,It respectively indicates the sample and corresponds to user's longitude and latitude.

5. retail shop where the user according to claim 4 based on inclined label study positions big data prediction technique, special Sign is that the step 104 constructs the specific steps of similarity graph according to feature space are as follows:

In order to construct similarity graph < V based on feature space, E, ω_e>, need to define the node V of similar diagram, similar diagram respectively In the while weights omega of E and similar diagram_e；

The definition of the node of 1041. similar diagrams: each of inclined flag data collection sample is considered as the node in similarity graph；

The definition on the side of 1042. similar diagrams: for each of each of inclined flag data collection sample, that is, similarity graph Node selects 10 samples in addition to itself that Wi-Fi is nearest apart from intensity Euclidean distance therewith as affiliated partner, i.e., will Corresponding two o'clock is attached in similar diagram, the side as similar diagram；

The definition of the side right weight of 1043. similar diagrams: the side (a, b) according to the similar (a, b) in formula (2) as similar diagram Weight, wherein a, b be respectively in similar diagram two nodes two corresponding samples are concentrated in inclined flag data.

6. retail shop where the user according to claim 5 based on inclined label study positions big data prediction technique, special Sign is that the step 105 carries out probability propagation, specific steps according to similarity graph are as follows:

1051. initialization probabilities: for each sample, it is whole for assuming initially that retail shop appears in the probability in the conjunction of its candidate quotient paved set The probability that retail shop occurs in data set is appeared in the sample as retail shop and waited by the ratio that a data concentrate the retail shop to occur The priori knowledge of retail shop's set probability is selected, and is assumed under conditions of the Wi-Fi of i-th of sample occurs apart from intensity, Candidate Set In retail shop be this base of a fruit of the probability logic of obligation of true tag distribution, then according to existing inclined flag data collection, construct seemingly Right function:

Wherein p (y ∈ S_i|x_i, θ) and it is under conditions of the Wi-Fi of i-th of sample occurs apart from intensity vector, true tag exists Probability in the candidate quotient paved set of the sample is closed, n_yIndicate the number that retail shop y occurs in entire data set, π_i,yFor retail shop y Appear in its candidate quotient paved set close in probability, p (y | x_i, θ) and it is to occur in the Wi-Fi of i-th of sample apart from intensity vector Under the conditions of, retail shop y is the probability of true tag, this likelihood function has formalized the true of each of entire data set sample Real label is all present in candidate quotient paved set and closes this known facts, and parameter value θ can be estimated with Maximum-likelihood estimation, WhereinAs occur in the Wi-Fi of this sample apart from strength characteristic vector Under the conditions of, retail shop y it is following will the user corresponding to this sample the probability for interacting behavior, probability propagation will be used as Initialization probability；

The propagation of 1052. probability: in the t wheel iteration of probability propagation, according to the probability matrix F of last round of iteration_t-1With it is initial Change probability matrix P=[p (y_i=j | x_i,θ)]_m×q, the new probability matrix by field sample propagation effect of a wheel can be obtained F_t:

Wherein W ∈ R^m×mSimilarity matrix between sample and sample, probability propagation has the wheel of iteration 50 altogether, in probability propagation In each round, retail shop corresponding to each sample interacts probability and is broadcast to neighbour corresponding to it according to the similarity between sample Example, each sample are general to the interaction of this retail shop to update oneself according to the interaction probability of retail shop corresponding to its 10 neighbour's samples Rate.

7. retail shop where the user according to claim 6 based on inclined label study positions big data prediction technique, special Sign is that in label problem concerning study partially, each round iteration needs to carry out disambiguation operation to updated probability matrix, i.e., will be every Retail shop's interaction probability in a sample non-candidate retail shop set is set to 0, carries out to retail shop's interaction probability in the conjunction of candidate quotient paved set Normalization:

8. retail shop where the user according to claim 6 or 7 based on inclined label study positions big data prediction technique, It is characterized in that, the step 106 convergent probability by propagating is predicted from the conjunction of the candidate quotient paved set of inclined flag data collection The specific steps for the retail shop that user will have behavior to interact in future are as follows:

The probability matrix F obtained according to convergence is propagated in 105 steps_t, can be obtained each sample, to correspond to user most possibly mutual Dynamic prediction retail shop: