CN110060102A - Retail shop where user based on inclined label study positions big data prediction technique - Google Patents

Retail shop where user based on inclined label study positions big data prediction technique Download PDF

Info

Publication number
CN110060102A
CN110060102A CN201910313789.9A CN201910313789A CN110060102A CN 110060102 A CN110060102 A CN 110060102A CN 201910313789 A CN201910313789 A CN 201910313789A CN 110060102 A CN110060102 A CN 110060102A
Authority
CN
China
Prior art keywords
sample
retail shop
probability
user
candidate
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201910313789.9A
Other languages
Chinese (zh)
Other versions
CN110060102B (en
Inventor
王进
闵子剑
孙开伟
许景益
邓欣
刘彬
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Xinsuo Consulting Co.,Ltd.
Original Assignee
Chongqing University of Post and Telecommunications
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Chongqing University of Post and Telecommunications filed Critical Chongqing University of Post and Telecommunications
Priority to CN201910313789.9A priority Critical patent/CN110060102B/en
Publication of CN110060102A publication Critical patent/CN110060102A/en
Application granted granted Critical
Publication of CN110060102B publication Critical patent/CN110060102B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N7/00Computing arrangements based on specific mathematical models
    • G06N7/01Probabilistic graphical models, e.g. probabilistic networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/02Marketing; Price estimation or determination; Fundraising
    • G06Q30/0201Market modelling; Market analysis; Collecting market data
    • G06Q30/0203Market surveys; Market polls
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/02Marketing; Price estimation or determination; Fundraising
    • G06Q30/0241Advertisements
    • G06Q30/0251Targeted advertisements
    • G06Q30/0261Targeted advertisements based on user location

Abstract

A kind of user place retail shop positioning big data prediction technique based on inclined label study is claimed in the present invention, comprising: the shopping status data of 101 couples of users carry out pretreatment operation;The 102 candidate quotient paved sets according to corresponding to each sample, which are closed, constructs inclined flag data collection;103 pairs of inclined flag data collection carry out feature extraction operation;104 construct similarity graph according to feature space;105 carry out probability propagation according to similarity graph;106 by propagating a convergent probability, and the retail shop that user will have behavior to interact in future is predicted from the conjunction of the candidate quotient paved set of inclined flag data collection.The present invention is mainly by pre-processing user's history data, extract feature, convert inclined flag data collection, establish label learning model partially, according to the inclined flag data collection of the position behavior of user, the retail shop that user will have behavior to interact in future is predicted from the conjunction of candidate quotient paved set corresponding to each user, is allowed users to obtain more accurately Individualized Notification Service, is improved the shopping experience of user.

Description

Retail shop where user based on inclined label study positions big data prediction technique
Technical field
The invention belongs to mark study, big data processing technology field partially, it is particularly based on where probability propagation model user Retail shop positions big data prediction.
Background technique
Label study is a kind of output space Weakly supervised study associated with one group of candidate's tag set, candidate label partially Only having one in set is authentic signature, and remaining label is considered as interference noise label.During inclined label training, each The true tag of training sample is submerged in candidate tag set, thus can not be similar to strong supervised learning it is such, directly from The input space is obtained in data set to the learning algorithm for exporting space.However, in real life, believing with accurate unique tags The data set of breath is more difficult to more obtain.Therefore we have in face of how from do not have unicity and definition data set middle school The serious problem of habit.Recently, it marks study to provide many effective methods partially to go to solve problems, and widely uses In many practical applications, especially there is very big breakthrough where user in retail shop's orientation problem.
With the rapid proliferation of internet mobile payment, we enjoy life brought by more and more intelligent positionings just Benefit.Such as when client enters into certain dining room in market, the discount coupon in the mobile phone meeting automatic spring dining room;When client enters into market When clothes shop, mobile phone can automatically recommend the clothes that you like in this family shop;When client passes by one jeweler's shop in market, mobile phone It can think that a diamond ring for a long time is available in stock with automatic prompt client;When leaving market parking lot, mobile phone is permitted client's Can under can hand over parking fee automatically.The intimate service that these clients are enjoyed all be unable to do without the excavation of behind big data and machine learning Support.Retail shop's positioning analysis where client is while implicitly bringing the artificial intelligent body of client and testing, so that user is easier Retail shop's information interested to oneself is understood, to improve the purchasing power of client indirectly.How in correct time, correct Place give user most effective service, be big data era intelligence expand new challenge.
Summary of the invention
Present invention seek to address that the above problem of the prior art.One kind is proposed to allow users to obtain more accurately Individualized Notification Service, retail shop's positioning big data prediction where improving the user based on inclined label study of the shopping experience of user Method.Technical scheme is as follows:
A kind of user place retail shop positioning big data prediction technique based on inclined label study comprising following steps:
101. the position behavioral data of couple user carries out including in exceptional sample cleans, missing Wi-Fi information is filled in Pretreatment operation;
102. being closed according to candidate quotient paved set corresponding to each sample, each sample in data set is some user couple A kind of shopping status answered, the different shopping status of each user correspond to different candidate quotient paved sets and close, the time of each sample Select retail shop's set according to certain Rule, for each sample, this rule may be summarized to be three steps: 1, according to distance Find 10 retail shops nearest from this user's current shopping status;2, according to the convex quadratic programming problem of one innovation of optimization, come This 10 shops are solved for the importance of this user's current shopping status;3, according to importance, importance is selected to be greater than threshold value 0.4 retail shop closes as candidate quotient paved set, constructs inclined flag data collection;
103. a pair flag data collection partially carries out feature extraction operation, feature group of the Wi-Fi apart from strength characteristic vector is extracted At feature space, this feature vector is similar to ONE-HOT feature vector, and feature vector is represented as what data set occurred per one-dimensional Each Wi-Fi under user's current shopping status apart from intensity value;
104. constructing similarity graph according to feature space, specifically include:
For each of data set sample xi, repeatedly do identical operation: 1, by xiAs the one of similarity graph A node;2, by xiRegard central point as, according to xiWi-Fi is apart from strength characteristic vector between other samples in data set Euclidean distance is xiThe smallest 10 samples of Euclidean distance are chosen, then according to xiWith this 10 samples of selection, xiIt can see Work is the central sample point of this 10 samples, and the node that it is corresponding in figure is connected with side in similar diagram;
105. carrying out probability propagation according to similarity graph;For each of data set sample xi, repeatedly do phase Same operation: 1, it initializes: optimized parameter is calculated according to likelihood function (formula (6)), to calculate xiCorresponding candidate The probability that the candidate retail shop of each of retail shop's set may interact, using this probability distribution as xiCorresponding candidate quotient paved set is closed Each of candidate retail shop initialization probability distribution;2, for the t times iteration of probability propagation algorithm: according to based on similar diagram Formula obtain the t times iteration xiThe probability distribution of corresponding candidate retail shop realizes the probability propagation of the t times iteration, meter The process for calculating this formula is exactly the process of a probability propagation, this communication process can only be realized in similar diagram corresponding to each edge Two nodes between propagation, due to may result in not during propagation in xiWhat corresponding candidate quotient paved set was closed The interaction probability of retail shop is not 0, therefore will be to all retail shops relative to xiInteraction probability carry out disambiguation normalization, a, for non- The interaction probability for the retail shop that candidate quotient paved set is closed is set to 0;B, the interaction probability for the retail shop closed for candidate quotient paved set carries out maximum Minimum normalization.
106. pre- from the conjunction of the candidate quotient paved set of inclined flag data collection by the convergent probability of step 105 probability propagation institute Measure the retail shop that user will have behavior to interact in future.
Further, the step 101 carries out pretreatment operation specific steps to the shopping status data of user are as follows:
The cleaning of 1011. exceptional samples: the cleaning of exceptional sample passes through the longitude and latitude in original data set and currently shopping first The Wi-Fi strength information of state, according to formula
The abnormal confidence level of each sample is calculated, wherein λi,τiThe longitude of user corresponding to respectively i-th of sample, The Wi-Fi intensity of latitude and current state, m indicates data set sample size, if the abnormal confidence level c of certain sampleiLower than 0.15 Or be higher than 0.85, then the sample is determined as exceptional sample, and it is filtered away from original data set;
1012. missing Wi-Fi information filling: find first from Wi-Fi strength information missing sample longitude and latitude the most Similar 10 samples, and this corresponding Wi-Fi strength information of 10 samples is Given information, the phase between two sample Like property according to formula
It is calculated, wherein λa,λb,User corresponding to the longitude and latitude and sample b of user corresponding to respectively sample a Longitude and latitude,The respectively variance of longitude and latitude in entire data set, then by this 10 samples according to public affairs Formula
It goes to fill the Wi-Fi strength information that the sample lacks, wherein sample a is sample to be filled, ai(i=1,2 ..., 10) 10 neighbour's samples for being sample a,For sample aiCorresponding Wi-Fi strength information.
Further, the step 102 candidate quotient paved set according to corresponding to each sample, which is closed, constructs inclined flag data collection Specific steps are as follows:
For each sample in former data, following operation is repeated to construct inclined flag data collection: (1) according to former number According to user's longitude and latitude and retail shop's longitude and latitude is concentrated, the distance between sample and each retail shop are calculatedWherein λA,Respectively Indicate the longitude and latitude of shop A, λa,Respectively indicate sample a longitude and latitude;(2) according to calculated distance d, selection and sample 10 nearest retail shops of distance;(3) longitude and latitude of 10 nearest retail shops of the distance according to corresponding to this sample, to following secondary Planning equation optimizes:
10 retail shops corresponding to the sample are solved with respect to the weighted value of this sample, wherein λa,Respectively indicate sample a institute Corresponding user's longitude and latitude, ωa,iThe retail shop i that (i=1,2 ..., 10) is respectively indicated in 10 nearest retail shops of distance sample a is opposite In the weighted value of sample a,The longitude and latitude of 10 retail shops nearest corresponding to sample a is respectively indicated, If weight corresponding to the retail shop calculated is greater than 0.4, which is added in the candidate quotient paved set conjunction of the sample.
Further, the step 103 carries out feature extraction operation to inclined flag data collection, specifically includes step:
Wi-Fi is apart from intensity: turning to 1000 dimensional feature vectors for Wi-Fi title is discrete first, characteristic value is Wi-Fi Corresponding Wi-Fi intensity, then according to conversion formula:
The Wi-Fi strength characteristic vector of discretization is converted in order to which Wi-Fi is apart from strength characteristic vector, whereinIt is i-th 1000 dimension Wi-Fi of a sample apart from strength characteristic vector,Wi- corresponding to 1000 dimensions Wi-Fi for i-th of sample Fi strength characteristic vector, | Yi| it is the size that the corresponding candidate quotient paved set of i-th of sample is closed,Respectively Indicate the corresponding candidate retail shop A of the samplejLongitude and latitude, λa,It respectively indicates the sample and corresponds to user's longitude and latitude.
Further, the step 104 constructs the specific steps of similarity graph according to feature space are as follows:
In order to construct similarity graph < V based on feature space, E, ωe>, need to define node V, the phase of similar diagram respectively Like the while weights omega in E and similar diagram of figuree
The definition of the node of 1041. similar diagrams: each of inclined flag data collection sample is considered as in similarity graph Node;
The definition on the side of 1042. similar diagrams: for every in each of inclined flag data collection sample, that is, similarity graph One node, 10 samples in addition to itself for selecting Wi-Fi therewith nearest apart from intensity Euclidean distance as affiliated partner, Two o'clock corresponding in similar diagram is attached, the side as similar diagram;
The definition of the side right weight of 1043. similar diagrams: the side according to the similar (a, b) in formula (2) as similar diagram The weight of (a, b), wherein a, b are respectively two samples that two nodes are corresponding in inclined flag data concentration in similar diagram.
Further, the step 105 carries out probability propagation, specific steps according to similarity graph are as follows:
1051. initialization probabilities: for each sample, assume initially that retail shop appears in the probability in the conjunction of its candidate quotient paved set For the ratio that the retail shop in entire data set occurs, i.e., the probability that retail shop occurs in data set is appeared in into the sample as retail shop This candidate quotient paved set closes the priori knowledge of probability, and assumes to wait under conditions of the Wi-Fi of i-th of sample occurs apart from intensity Retail shop in selected works is the distribution of this base of a fruit of the probability logic of obligation of true tag, then according to existing inclined flag data collection, construction Likelihood function out:
Wherein p (y ∈ Si|xi, θ) and it is the true tag under conditions of Wi-Fi of i-th of sample occurs apart from intensity vector The probability being present in the candidate quotient paved set conjunction of the sample, nyIndicate the number that retail shop y occurs in entire data set, πi,yFor Retail shop y appear in its candidate quotient paved set close in probability, p (y | xi, θ) and it is to go out in the Wi-Fi of i-th of sample apart from intensity vector Under conditions of existing, retail shop y is the probability of true tag, this likelihood function has formalized each of entire data set sample True tag be all present in candidate quotient paved set and close this known facts, and parameter value θ can be carried out with Maximum-likelihood estimation It estimates, whereinAs in the Wi-Fi of this sample apart from strength characteristic vector Under conditions of appearance, retail shop y it is following will the user corresponding to this sample the probability for interacting behavior, will be as general The initialization probability that rate is propagated;
The propagation of 1052. probability: in the t wheel iteration of probability propagation, according to the probability matrix F of last round of iterationt-1With Initialization probability matrix P=[p (yi=j | xi,θ)]m×q, the new probability by field sample propagation effect of a wheel can be obtained Matrix Ft:
Wherein W ∈ Rm×mSimilarity matrix between sample and sample, probability propagation have the wheel of iteration 50 altogether, pass in probability In each round broadcast, the interaction probability of retail shop corresponding to each sample is broadcast to corresponding to it according to the similarity between sample Neighbour's example, each sample interact probability according to retail shop corresponding to its 10 neighbour's samples to update oneself to the mutual of this retail shop Dynamic probability.
Further, in label problem concerning study partially, each round iteration needs to disambiguate updated probability matrix Retail shop's interaction probability in each sample non-candidate retail shop set is set to 0 by operation, mutual to the retail shop in the conjunction of candidate quotient paved set Dynamic probability is normalized:
Further, the step 106 is closed by propagating a convergent probability from the candidate quotient paved set of inclined flag data collection In predict user have in future behavior interact retail shop specific steps are as follows:
The probability matrix F obtained according to convergence is propagated in 105 stepst, can be obtained each sample and correspond to user most has The prediction retail shop that may be interacted:
It advantages of the present invention and has the beneficial effect that:
1, retail shop's positioning applies itself, and most common prediction technique is basic more sorting machine learning methods, and more points Class method can consume a large amount of resource, and the possible label of each sample should be the subset of all labels, i.e., each sample True tag possibly only be present in certain several label, rather than more classification methods all labels are regarded as it is possible true Label, the precision that will lead to more classification methods in this way is insufficient.Therefore this patent has innovatively regarded retail shop's positioning application partially as Label learning method predicted, can make full use of the label information of that several retail shop of the only possible interaction of each sample into Row prediction, greatly improves the precision of model;
2, in exceptional sample cleaning step, it is contemplated that the fact that sample standard deviation in data set is in the same commercial circle, This patent innovatively creates relevant different to the Wi-Fi intensity of the longitude and latitude of user corresponding to sample and current shopping status Normal confidence level will deviate from the sample that the average confidence level in data set is too high or itself confidence level is too low and clear out.
3, more similar with according to corresponding user's longitude and latitude between different samples in retail shop's positioning application, the shopping locating for them State just should more similar principle, this patent innovatively creates the similarity formula based on this principle, to indicate different Similarity degree between sample, there are two effects in this patent for this similarity: (1) according to the sample with Wi-Fi loss of learning 10 minimum samples of similarity remove the missing information of the sample of filling Wi-Fi loss of learning;(2) similarity can be used as phase Like in figure, side right is great small between sample.
4, during constructing inclined flag data collection, conventional building method corresponds to user only by the sample is found 10 nearest retail shops of distance, however excessive noise figure can be brought to inclined flag data collection in this way, therefore we also need 10 retail shops for adjusting the distance nearest are screened, this patent innovatively create with corresponding to retail shop's longitude and latitude and sample The relevant quadratic programming equation of the longitude and latitude of user, interaction weight of this quadratic programming equation by each retail shop relative to the sample As variable is solved, according to optimal solution variable corresponding to optimization quadratic programming equation, it will be able to filter out as much as possible User's current shopping status relative distance (relative to other 9 shops) immediate retail shop, can substantially reduce inclined reference numerals According to the too big brought noise figure of the candidate tally set size of collection.
5, during feature extraction operation, this patent caught in retail shop's positioning application user corresponding to each sample with Candidate quotient paved set close in each retail shop's distance average value, can by each sample candidate retail shop and non-candidate retail shop very The characteristic distinguished well, and simultaneously in view of average distance can not distinguish asking for the retail shop in the conjunction of candidate quotient paved set well Topic, Wi-Fi intensity corresponding to each sample is combined with average distance, innovatively proposes Wi-Fi apart from intensity Vector characteristics ensure that the area between the retail shop in the conjunction of candidate quotient paved set while distinguishing candidate retail shop and non-candidate retail shop Indexing.
6, during probability propagation, classical label propagation algorithm is transformed by this patent.Classical label passes It broadcasts algorithm to only account for the appearance of candidate retail shop and do not occur this case shell, and does not consider the potential general of candidate quotient paved set conjunction Rate distribution, therefore classical label propagation algorithm is unable to reach satisfactory expressive force, this patent is utilized label and propagates calculation The framework of method, the probability propagation algorithm that this patent proposes on this basis are estimated according to the maximum likelihood that this base of a fruit of logic-based is distributed Meter goes to excavate the probability distribution of the conjunction of candidate quotient paved set corresponding to each sample, is then put into the probability distribution that estimation obtains In the frame of label propagation algorithm, and innovatively propose disambiguate normalization (a, for non-candidate retail shop set retail shop it is mutual Dynamic probability is set to 0;B, the interaction probability for the retail shop closed for candidate quotient paved set carries out minimax normalization) it optimizes and is propagated through Not the problem of Cheng Zhong non-candidate retail shop probability is not 0.Essentially, probability propagation algorithm solve label propagation algorithm can only be in number The shortcomings that carrying out data mining according to surface layer substantially increases the prediction result of label study partially.
Detailed description of the invention
Fig. 1 is that retail shop's positioning big data where the present invention provides a kind of user based on inclined label study of preferred embodiment is pre- The flow chart of survey method.
It is pre- that retail shop where Fig. 2 provides a kind of user based on inclined label study of preferred embodiment for the present invention positions big data Sample Similarity figure in survey method.
It is pre- that retail shop where Fig. 3 provides a kind of user based on inclined label study of preferred embodiment for the present invention positions big data Learning model practical application general frame figure is marked in survey method partially.
Specific embodiment
Following will be combined with the drawings in the embodiments of the present invention, and technical solution in the embodiment of the present invention carries out clear, detailed Carefully describe.Described embodiment is only a part of the embodiments of the present invention.
The technical solution that the present invention solves above-mentioned technical problem is:
With reference to Fig. 1, Fig. 1 is that retail shop's positioning where the embodiment of the present invention one provides a kind of user based on inclined label study is big The flow chart of data predication method, specifically includes:
101. the shopping status data of couple user carry out pretreatment operation, specific as follows: 1011. exceptional samples cleaning: different The cleaning of normal sample passes through the Wi-Fi intensity letter of the longitude and latitude of user corresponding to sample in original data set and current state first Breath calculates the abnormal confidence level of each sample according to formula (1), if the abnormal confidence level c of certain sampleiLower than 0.15 or it is higher than 0.85, then the sample is determined as exceptional sample by us, and it is filtered away from original data set;1012. missing Wi-Fi letters The filling of breath: due to the factor of force majeure, the Wi-Fi strength information of certain samples can not be obtained accurately, according to longitude and latitude Similar sample, Wi-Fi strength information also answer similar thought, and the sample institute found first from Wi-Fi strength information missing is right Using the most similar 10 samples of family longitude and latitude, and this corresponding Wi-Fi strength information of 10 samples is Given information, Similitude between two samples is calculated according to formula (2), then goes filling should according to formula (3) by this 10 samples The Wi-Fi strength information of sample missing.
Inclined flag data collection is constructed 102. closing according to candidate quotient paved set corresponding to each user, it is specific as follows: for original Each sample in data repeats following operation to construct inclined flag data collection: (1) according to user's longitude and latitude in original data set Degree and retail shop's longitude and latitude calculate the distance between sample and each retail shop(wherein λA,Respectively Indicate the longitude and latitude of shop A, λa,Respectively indicate user a longitude and latitude);(2) with according to calculated distance d, selection and sample 10 nearest retail shops of distance;(3) longitude and latitude of 10 nearest retail shops of the distance according to corresponding to this sample, to quadratic programming Equation (formula (4)) optimizes, and weighted value of 10 retail shops corresponding to the sample with respect to this sample is solved, if calculating Retail shop corresponding to weight be greater than 0.4, then by the retail shop be added to the sample candidate quotient paved set close in.
103. carrying out feature extraction operation, tool to inclined flag data collection according to user's longitude and latitude and Wi-Fi strength information Body is as follows: turning to 1000 dimensional feature vectors for Wi-Fi title is discrete first, characteristic value is Wi-Fi corresponding to Wi-Fi strong Degree, then according to conversion formula (5) by the conversion of the Wi-Fi strength characteristic vector of discretization in order to Wi-Fi apart from strength characteristic to Amount.
104. according to feature space construct similarity graph, it is specific as follows: in order to construct the similarity graph based on feature space < V,E,ωe> (see Fig. 2) needs to define the node V of similar diagram, the while weights omega in E and similar diagram of similar diagram respectivelye
The definition of the node of 1041. similar diagrams: each of inclined flag data collection sample is considered as in similarity graph Node.
The definition on the side of 1042. similar diagrams: (each in similarity graph for each of inclined flag data collection sample A node), select 10 samples (node) in addition to itself that Wi-Fi is nearest apart from intensity Euclidean distance therewith as association Two o'clock corresponding in similar diagram is attached, the side as similar diagram by object.
The definition of the side right weight of 1043. similar diagrams: the side according to the similar (a, b) in formula (2) as similar diagram The weight of (a, b), wherein a, b are respectively two samples that two nodes are corresponding in inclined flag data concentration in similar diagram.
105. probability propagation is carried out according to similarity graph, specific as follows:
1051. initialization probabilities: for each sample, assume initially that retail shop appears in the probability in the conjunction of its candidate quotient paved set For the ratio that the retail shop in entire data set occurs, i.e., the probability that retail shop occurs in data set is appeared in into the sample as retail shop This candidate quotient paved set closes the priori knowledge of probability, and it is further assumed that the condition occurred in the Wi-Fi of i-th of sample apart from intensity Under, retail shop in Candidate Set is the distribution of this base of a fruit of the probability logic of obligation of true tag, then according to existing inclined flag data collection, Having constructed likelihood function is formula (6), and a likelihood function has formalized the true mark of each of entire data set sample Label are all present in candidate quotient paved set and close this known facts.And parameter value θ can be estimated with Maximum-likelihood estimation, whereinThe item as occurred in the Wi-Fi of this sample apart from strength characteristic vector Under part, retail shop y it is following will the user corresponding to this sample the probability for interacting behavior, will be as probability propagation Initialization probability.
The propagation of 1052. probability: in the t wheel iteration of probability propagation, according to the probability matrix F of last round of iterationt-1With Initialization probability matrix P=[p (yi=j | xi,θ)]m×q, the new probability by field sample propagation effect of a wheel can be obtained Matrix FtFor formula (7), probability propagation has the wheel of iteration 50 altogether.In each round of probability propagation, quotient corresponding to each sample Paving interaction probability is broadcast to neighbour's example corresponding to it according to the similarity between sample, and each sample is according to its 10 neighbours Retail shop corresponding to sample interacts probability to update oneself interaction probability to this retail shop.It is each in label problem concerning study partially Wheel iteration needs to carry out updated probability matrix disambiguation operation, i.e., the retail shop in each sample non-candidate retail shop set is mutual Dynamic probability is set to 0, and operation is normalized as shown in formula (8) to retail shop's interaction probability in the conjunction of candidate quotient paved set.
106. the convergent probability by propagating predicts user's future from the conjunction of the candidate quotient paved set of inclined flag data collection The retail shop for thering is behavior to interact, it is specific as follows: to restrain obtained probability matrix F according to propagating in 105 stepst, can be obtained every It is shown in formula (9) that a sample, which corresponds to the prediction retail shop that user most possibly interacts,.Made based on the probability propagation method marked partially User can obtain more accurately Individualized Notification Service, improve the shopping experience of user, becoming solution, nowadays label obtains Take the approach that can be effectively predicted under conditions of difficulty.Inclined label learning model based on big data retail shop where user positions The general frame figure of practical application see Fig. 3.
The above embodiment is interpreted as being merely to illustrate the present invention rather than limit the scope of the invention.? After the content for having read record of the invention, technical staff can be made various changes or modifications the present invention, these equivalent changes Change and modification equally falls into the scope of the claims in the present invention.

Claims (8)

1. retail shop where a kind of user based on inclined label study positions big data prediction technique, which is characterized in that including following Step:
101. the position behavioral data of couple user carries out including that exceptional sample cleans, missing Wi-Fi information is filled in interior pre- place Reason operation;
102. being closed according to candidate quotient paved set corresponding to each sample, each sample in data set is that some user is corresponding A kind of shopping status, the different shopping status of each user correspond to different candidate quotient paved sets and close, the candidate quotient of each sample Paved set is closed according to certain Rule, and for each sample, this rule may be summarized to be three steps: 1, being found according to distance 10 retail shops nearest from this user's current shopping status;2, according to the convex quadratic programming problem of one innovation of optimization, to solve Importance of this 10 shops for this user's current shopping status;3, according to importance, importance is selected to be greater than threshold value 0.4 Retail shop closes as candidate quotient paved set, constructs inclined flag data collection;
103. a pair flag data collection partially carries out feature extraction operation, extracts feature of the Wi-Fi apart from strength characteristic vector and form spy Space is levied, this feature vector is similar to ONE-HOT feature vector, and feature vector is represented as each of data set appearance per one-dimensional Kind of Wi-Fi under user's current shopping status apart from intensity value;
104. constructing similarity graph according to feature space, specifically include:
For each of data set sample xi, repeatedly do identical operation: 1, by xiAs a knot of similarity graph Point;2, by xiRegard central point as, according to xiEuclidean of the Wi-Fi apart from strength characteristic vector between other samples in data set Distance is xiThe smallest 10 samples of Euclidean distance are chosen, then according to xiWith this 10 samples of selection, xiIt can be regarded as The central sample point of this 10 samples, the node that it is corresponding in figure is connected with side in similar diagram;
105. carrying out probability propagation according to similarity graph;For each of data set sample xi, repeatedly it is identical behaviour Make: 1, initializing: optimized parameter is calculated according to likelihood function (formula (6)), to calculate xiCorresponding candidate quotient paved set The probability that each of conjunction candidate retail shop may interact, using this probability distribution as xiIt is every in corresponding candidate quotient paved set conjunction The initialization probability distribution of a candidate retail shop;2, for the t times iteration of probability propagation algorithm: according to the formula based on similar diagram Obtain the x of the t times iterationiThe probability distribution of corresponding candidate retail shop realizes the probability propagation of the t times iteration, calculates this The process of formula is exactly the process of a probability propagation, this communication process can only be realized in similar diagram two corresponding to each edge Propagation between node, due to may result in not during propagation in xiThe retail shop that corresponding candidate quotient paved set is closed Interacting probability is not 0, therefore will be to all retail shops relative to xiInteraction probability carry out disambiguation normalization, a, for non-candidate quotient The interaction probability for the retail shop that paved set is closed is set to 0;B, the interaction probability progress minimax for the retail shop closed for candidate quotient paved set is returned One changes.
106. being predicted from the conjunction of the candidate quotient paved set of inclined flag data collection by the convergent probability of step 105 probability propagation institute The retail shop that user will have behavior to interact in future.
2. retail shop where the user according to claim 1 based on inclined label study positions big data prediction technique, special Sign is that the step 101 carries out pretreatment operation specific steps to the shopping status data of user are as follows:
The cleaning of 1011. exceptional samples: the cleaning of exceptional sample passes through the longitude and latitude and current shopping status in original data set first Wi-Fi strength information, according to formula
The abnormal confidence level of each sample is calculated, wherein λi,τiThe longitude of user corresponding to respectively i-th of sample, latitude and The Wi-Fi intensity of current state, m indicates data set sample size, if the abnormal confidence level c of certain sampleiIt is lower than 0.15 or high In 0.85, then the sample is determined as exceptional sample, and it is filtered away from original data set;
The filling of 1012. missing Wi-Fi information: the sample longitude and latitude found first from Wi-Fi strength information missing is the most similar 10 samples, and this corresponding Wi-Fi strength information of 10 samples is Given information, the similitude between two sample According to formula
It is calculated, wherein λa,λb,The warp of user corresponding to the longitude and latitude and sample b of user corresponding to respectively sample a Latitude,The respectively variance of longitude and latitude in entire data set, then by this 10 samples according to formula
It goes to fill the Wi-Fi strength information that the sample lacks, wherein sample a is sample to be filled, ai(i=1,2 ..., 10) it is sample 10 neighbour's samples of this,For sample aiCorresponding Wi-Fi strength information.
3. retail shop where the user according to claim 2 based on inclined label study positions big data prediction technique, special Sign is that the step 102 candidate quotient paved set according to corresponding to each sample closes the specific steps for constructing inclined flag data collection Are as follows:
For each sample in former data, following operation is repeated to construct inclined flag data collection: (1) according to original data set Middle user's longitude and latitude and retail shop's longitude and latitude calculate the distance between sample and each retail shopWherein λA,Respectively Indicate the longitude and latitude of shop A, λa,Respectively indicate sample a longitude and latitude;(2) according to calculated distance d, selection and sample 10 nearest retail shops of distance;(3) longitude and latitude of 10 nearest retail shops of the distance according to corresponding to this sample, to following secondary Planning equation optimizes:
10 retail shops corresponding to the sample are solved with respect to the weighted value of this sample, wherein λa,It respectively indicates corresponding to sample a User's longitude and latitude, ωa,i(i=1,2 ..., 10) respectively indicates the retail shop i in 10 nearest retail shops of distance sample a relative to sample The weighted value of this,The longitude and latitude of 10 retail shops nearest corresponding to sample a is respectively indicated, if meter Weight corresponding to the retail shop calculated is greater than 0.4, then the retail shop is added in the candidate quotient paved set conjunction of the sample.
4. retail shop where the user according to claim 3 based on inclined label study positions big data prediction technique, special Sign is that the step 103 carries out feature extraction operation to inclined flag data collection, specifically includes step:
Wi-Fi is apart from intensity: turning to 1000 dimensional feature vectors for Wi-Fi title is discrete first, characteristic value is that Wi-Fi institutes are right The Wi-Fi intensity answered, then according to conversion formula:
The Wi-Fi strength characteristic vector of discretization is converted in order to which Wi-Fi is apart from strength characteristic vector, whereinFor i-th of sample 1000 dimension Wi-Fi apart from strength characteristic vector,Wi-Fi intensity corresponding to 1000 dimensions Wi-Fi for i-th of sample is special Vector is levied, | Yi| it is the size that the corresponding candidate quotient paved set of i-th of sample is closed,Respectively indicate the sample This correspondence candidate retail shop AjLongitude and latitude, λa,It respectively indicates the sample and corresponds to user's longitude and latitude.
5. retail shop where the user according to claim 4 based on inclined label study positions big data prediction technique, special Sign is that the step 104 constructs the specific steps of similarity graph according to feature space are as follows:
In order to construct similarity graph < V based on feature space, E, ωe>, need to define the node V of similar diagram, similar diagram respectively In the while weights omega of E and similar diagrame
The definition of the node of 1041. similar diagrams: each of inclined flag data collection sample is considered as the node in similarity graph;
The definition on the side of 1042. similar diagrams: for each of each of inclined flag data collection sample, that is, similarity graph Node selects 10 samples in addition to itself that Wi-Fi is nearest apart from intensity Euclidean distance therewith as affiliated partner, i.e., will Corresponding two o'clock is attached in similar diagram, the side as similar diagram;
The definition of the side right weight of 1043. similar diagrams: the side (a, b) according to the similar (a, b) in formula (2) as similar diagram Weight, wherein a, b be respectively in similar diagram two nodes two corresponding samples are concentrated in inclined flag data.
6. retail shop where the user according to claim 5 based on inclined label study positions big data prediction technique, special Sign is that the step 105 carries out probability propagation, specific steps according to similarity graph are as follows:
1051. initialization probabilities: for each sample, it is whole for assuming initially that retail shop appears in the probability in the conjunction of its candidate quotient paved set The probability that retail shop occurs in data set is appeared in the sample as retail shop and waited by the ratio that a data concentrate the retail shop to occur The priori knowledge of retail shop's set probability is selected, and is assumed under conditions of the Wi-Fi of i-th of sample occurs apart from intensity, Candidate Set In retail shop be this base of a fruit of the probability logic of obligation of true tag distribution, then according to existing inclined flag data collection, construct seemingly Right function:
Wherein p (y ∈ Si|xi, θ) and it is under conditions of the Wi-Fi of i-th of sample occurs apart from intensity vector, true tag exists Probability in the candidate quotient paved set of the sample is closed, nyIndicate the number that retail shop y occurs in entire data set, πi,yFor retail shop y Appear in its candidate quotient paved set close in probability, p (y | xi, θ) and it is to occur in the Wi-Fi of i-th of sample apart from intensity vector Under the conditions of, retail shop y is the probability of true tag, this likelihood function has formalized the true of each of entire data set sample Real label is all present in candidate quotient paved set and closes this known facts, and parameter value θ can be estimated with Maximum-likelihood estimation, WhereinAs occur in the Wi-Fi of this sample apart from strength characteristic vector Under the conditions of, retail shop y it is following will the user corresponding to this sample the probability for interacting behavior, probability propagation will be used as Initialization probability;
The propagation of 1052. probability: in the t wheel iteration of probability propagation, according to the probability matrix F of last round of iterationt-1With it is initial Change probability matrix P=[p (yi=j | xi,θ)]m×q, the new probability matrix by field sample propagation effect of a wheel can be obtained Ft:
Wherein W ∈ Rm×mSimilarity matrix between sample and sample, probability propagation has the wheel of iteration 50 altogether, in probability propagation In each round, retail shop corresponding to each sample interacts probability and is broadcast to neighbour corresponding to it according to the similarity between sample Example, each sample are general to the interaction of this retail shop to update oneself according to the interaction probability of retail shop corresponding to its 10 neighbour's samples Rate.
7. retail shop where the user according to claim 6 based on inclined label study positions big data prediction technique, special Sign is that in label problem concerning study partially, each round iteration needs to carry out disambiguation operation to updated probability matrix, i.e., will be every Retail shop's interaction probability in a sample non-candidate retail shop set is set to 0, carries out to retail shop's interaction probability in the conjunction of candidate quotient paved set Normalization:
8. retail shop where the user according to claim 6 or 7 based on inclined label study positions big data prediction technique, It is characterized in that, the step 106 convergent probability by propagating is predicted from the conjunction of the candidate quotient paved set of inclined flag data collection The specific steps for the retail shop that user will have behavior to interact in future are as follows:
The probability matrix F obtained according to convergence is propagated in 105 stepst, can be obtained each sample, to correspond to user most possibly mutual Dynamic prediction retail shop:
CN201910313789.9A 2019-04-18 2019-04-18 Bias label learning-based method for predicting positioning big data of shops where users are located Active CN110060102B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910313789.9A CN110060102B (en) 2019-04-18 2019-04-18 Bias label learning-based method for predicting positioning big data of shops where users are located

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910313789.9A CN110060102B (en) 2019-04-18 2019-04-18 Bias label learning-based method for predicting positioning big data of shops where users are located

Publications (2)

Publication Number Publication Date
CN110060102A true CN110060102A (en) 2019-07-26
CN110060102B CN110060102B (en) 2022-05-03

Family

ID=67319420

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910313789.9A Active CN110060102B (en) 2019-04-18 2019-04-18 Bias label learning-based method for predicting positioning big data of shops where users are located

Country Status (1)

Country Link
CN (1) CN110060102B (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111581467A (en) * 2020-05-15 2020-08-25 北京交通大学 Bias label learning method based on subspace representation and global disambiguation method
CN111581466A (en) * 2020-05-15 2020-08-25 北京交通大学 Multi-label learning method for characteristic information with noise

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8339316B1 (en) * 2010-08-13 2012-12-25 Google Inc. Smart GPS use
CN105044662A (en) * 2015-05-27 2015-11-11 南京邮电大学 Fingerprint clustering multi-point joint indoor positioning method based on WIFI signal intensity
CN106525052A (en) * 2016-12-14 2017-03-22 广东工业大学 Graph model constrained indoor positioning method
CN107086935A (en) * 2017-06-16 2017-08-22 重庆邮电大学 Flow of the people distribution forecasting method based on WIFI AP
CN107845260A (en) * 2017-10-26 2018-03-27 杭州东信北邮信息技术有限公司 A kind of recognition methods of user's bus trip mode
US20180144209A1 (en) * 2016-11-22 2018-05-24 Lunit Inc. Object recognition method and apparatus based on weakly supervised learning
CN108764292A (en) * 2018-04-27 2018-11-06 北京大学 Deep learning image object mapping based on Weakly supervised information and localization method
CN109089314A (en) * 2018-09-30 2018-12-25 哈尔滨工业大学(深圳) A kind of indoor orientation method of the wifi sequence assistant GPS based on proposed algorithm
CN109242552A (en) * 2018-08-22 2019-01-18 重庆邮电大学 A kind of retail shop's localization method based on big data

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8339316B1 (en) * 2010-08-13 2012-12-25 Google Inc. Smart GPS use
CN105044662A (en) * 2015-05-27 2015-11-11 南京邮电大学 Fingerprint clustering multi-point joint indoor positioning method based on WIFI signal intensity
US20180144209A1 (en) * 2016-11-22 2018-05-24 Lunit Inc. Object recognition method and apparatus based on weakly supervised learning
CN106525052A (en) * 2016-12-14 2017-03-22 广东工业大学 Graph model constrained indoor positioning method
CN107086935A (en) * 2017-06-16 2017-08-22 重庆邮电大学 Flow of the people distribution forecasting method based on WIFI AP
CN107845260A (en) * 2017-10-26 2018-03-27 杭州东信北邮信息技术有限公司 A kind of recognition methods of user's bus trip mode
CN108764292A (en) * 2018-04-27 2018-11-06 北京大学 Deep learning image object mapping based on Weakly supervised information and localization method
CN109242552A (en) * 2018-08-22 2019-01-18 重庆邮电大学 A kind of retail shop's localization method based on big data
CN109089314A (en) * 2018-09-30 2018-12-25 哈尔滨工业大学(深圳) A kind of indoor orientation method of the wifi sequence assistant GPS based on proposed algorithm

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
RAMAZAN GOKBERK CINBIS,ETC: "Weakly Supervised Object Localization with Multi-Fold Multiple Instance Learning", 《IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE(VOLUME:39,ISSUE:1,JAN.1 2017)》 *
张敏灵: "偏标记学习研究综述", 《数据采集与处理》 *
杜成喜等: "基于XGBoost的用户定位与商铺推荐", 《无线互联科技》 *
王进等: "基于MPI的近邻距离加权偏标记学习算法之并行实现", 《江苏大学学报(自然科学版)》 *

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111581467A (en) * 2020-05-15 2020-08-25 北京交通大学 Bias label learning method based on subspace representation and global disambiguation method
CN111581466A (en) * 2020-05-15 2020-08-25 北京交通大学 Multi-label learning method for characteristic information with noise
CN111581466B (en) * 2020-05-15 2024-02-27 北京交通大学 Partial multi-mark learning method for characteristic information noise
CN111581467B (en) * 2020-05-15 2024-04-02 北京交通大学 Partial mark learning method based on subspace representation and global disambiguation method

Also Published As

Publication number Publication date
CN110060102B (en) 2022-05-03

Similar Documents

Publication Publication Date Title
US20230162071A1 (en) Autonomous learning platform for novel feature discovery
Shrivastava et al. Failure prediction of Indian Banks using SMOTE, Lasso regression, bagging and boosting
Liu et al. Visualizing and exploring POI configurations of urban regions on POI-type semantic space
CN110363282B (en) Network node label active learning method and system based on graph convolution network
Cheng et al. A non-linear case-based reasoning approach for retrieval of similar cases and selection of target credits in LEED projects
JP2009076042A (en) Learning user&#39;s activity preference from gps trace and known nearby venue
CN105532030A (en) Apparatus, systems, and methods for analyzing movements of target entities
CN107330734B (en) Co-location mode and ontology-based business address selection method
Majewska et al. Cluster-mapping procedure for tourism regions based on geostatistics and fuzzy clustering: example of Polish districts
CN110060102A (en) Retail shop where user based on inclined label study positions big data prediction technique
CN109993184A (en) A kind of method and data fusion equipment of data fusion
Rodrigues et al. Automatic classification of points-of-interest for land-use analysis
US11783436B2 (en) Magellan: a context-aware itinerary recommendation system built only using card-transaction data
CN113868537B (en) Recommendation method based on multi-behavior session graph fusion
Pramanik et al. Deep learning based resource availability prediction for local mobile crowd computing
Zhuang et al. Integrating a deep forest algorithm with vector‐based cellular automata for urban land change simulation
CN117217779A (en) Training method and device of prediction model and information prediction method and device
Hsieh et al. Estimating potential customers anywhere and anytime based on location-based social networks
Li et al. Multi-modal representation learning for successive poi recommendation
CN115935079A (en) Graph collaborative filtering recommendation method based on clusters
Manca et al. Fuzzy analysis for modeling regional delineation and development: the case of the Sardinian Mining Geopark
Bose Data mining in tourism
Tang et al. Discovering urban functional zones from biased and sparse points of interests and sparse human activities
Shin et al. Hybrid model–based motion recognition for smartphone users
Xiao et al. A dynamic transfer ensemble model for customer churn prediction

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
TR01 Transfer of patent right
TR01 Transfer of patent right

Effective date of registration: 20230111

Address after: Room 801, 85 Kefeng Road, Huangpu District, Guangzhou City, Guangdong Province

Patentee after: Yami Technology (Guangzhou) Co.,Ltd.

Address before: 400065 Chongwen Road, Nanshan Street, Nanan District, Chongqing

Patentee before: CHONGQING University OF POSTS AND TELECOMMUNICATIONS

TR01 Transfer of patent right
TR01 Transfer of patent right

Effective date of registration: 20230804

Address after: Room 205-03, 2nd Floor, Building 2, No.1 and No.3, Qinglong Hutong A, Dongcheng District, Beijing, 100010

Patentee after: Beijing Xinsuo Consulting Co.,Ltd.

Address before: Room 801, 85 Kefeng Road, Huangpu District, Guangzhou City, Guangdong Province

Patentee before: Yami Technology (Guangzhou) Co.,Ltd.