CN110442662A - A kind of method and information-pushing method of determining customer attribute information - Google Patents

A kind of method and information-pushing method of determining customer attribute information Download PDF

Info

Publication number
CN110442662A
CN110442662A CN201910611619.9A CN201910611619A CN110442662A CN 110442662 A CN110442662 A CN 110442662A CN 201910611619 A CN201910611619 A CN 201910611619A CN 110442662 A CN110442662 A CN 110442662A
Authority
CN
China
Prior art keywords
data
region
user
attribute information
node
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201910611619.9A
Other languages
Chinese (zh)
Other versions
CN110442662B (en
Inventor
王凯平
李萌
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tsinghua University
Original Assignee
Tsinghua University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tsinghua University filed Critical Tsinghua University
Priority to CN201910611619.9A priority Critical patent/CN110442662B/en
Publication of CN110442662A publication Critical patent/CN110442662A/en
Application granted granted Critical
Publication of CN110442662B publication Critical patent/CN110442662B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/29Geographical information databases
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/906Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/951Indexing; Web crawling techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9535Search customisation based on user profiles and personalisation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9537Spatial or temporal dependent retrieval, e.g. spatiotemporal queries
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/02Marketing; Price estimation or determination; Fundraising
    • G06Q30/0241Advertisements
    • G06Q30/0251Targeted advertisements
    • G06Q30/0255Targeted advertisements based on user history
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/02Marketing; Price estimation or determination; Fundraising
    • G06Q30/0241Advertisements
    • G06Q30/0277Online advertisement
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/50Network services
    • H04L67/52Network services specially adapted for the location of the user terminal
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/50Network services
    • H04L67/55Push-based network services
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04WWIRELESS COMMUNICATION NETWORKS
    • H04W4/00Services specially adapted for wireless communication networks; Facilities therefor
    • H04W4/02Services making use of location information
    • H04W4/021Services related to particular areas, e.g. point of interest [POI] services, venue services or geofences
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04WWIRELESS COMMUNICATION NETWORKS
    • H04W4/00Services specially adapted for wireless communication networks; Facilities therefor
    • H04W4/02Services making use of location information
    • H04W4/029Location-based management or tracking services
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04WWIRELESS COMMUNICATION NETWORKS
    • H04W4/00Services specially adapted for wireless communication networks; Facilities therefor
    • H04W4/12Messaging; Mailboxes; Announcements

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Business, Economics & Management (AREA)
  • General Engineering & Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Data Mining & Analysis (AREA)
  • Signal Processing (AREA)
  • Accounting & Taxation (AREA)
  • Development Economics (AREA)
  • Finance (AREA)
  • Strategic Management (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Marketing (AREA)
  • General Business, Economics & Management (AREA)
  • Economics (AREA)
  • Game Theory and Decision Science (AREA)
  • Remote Sensing (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The present invention provides a kind of methods for determining customer attribute information based on region POI data, personal space-time data, this method comprises: obtaining map open platform POI data by predeterminable area, and pre-process to it;Personal space-time data based on mobile terminal counts each region and clusters in the discrepancy number of different time, and to the number of statistics;Cluster result, pretreated POI data are parsed using DMR topic model algorithm, determine regional function attribute information;Personal space-time data based on mobile terminal generates the daily track data of each user, which includes the node in Trip chain and Trip chain;Customer attribute information is determined according to the Trip chain, each node and regional function attribute information corresponding with each node.By means of the invention it is possible to accurately determine user property, and targeted information push is carried out according to user property.

Description

A kind of method and information-pushing method of determining customer attribute information
Technical field
The present invention relates to information technology fields more particularly to the method and information of a kind of determining customer attribute information to push Method.
Background technique
The acquisition of POI data is traditionally the warp that ground mapping personnel obtain a point of interest using accurate instrument of surveying and mapping Then latitude marks again, in order to enrich message, the acquisition of POI data further include interest point name, classification, classification and Recommended information etc., but these information are static.Currently, these information are also used in addition to being used for digital map navigation based on user Keyword input carry out the push of corresponding POI information.
It is installed with GPS system mostly on smart phone at present, geographical location locating for user can be acquired in real time, there are many Using being developed, the specific position based on user locating at that time geographical location or input carries out information recommendation.
It is clue matching that current Internet advertising both domestic and external, which is mainly concentrated through the keyword for inputting or browsing to audient, And the sequencing of advertisement pushing carries out bid ranking decision according to advertiser's release price height.
Information above push or recommended method, which exist, is unable to judge accurately correlation and contract of the advertising information with audient Right defect.
Summary of the invention
In view of the above problems, it proposes on the present invention overcomes the above problem or at least be partially solved in order to provide one kind State the technical solution of problem.One aspect of the present invention provides a kind of determining based on region POI data, personal space-time data The method of customer attribute information, this method comprises:
Map open platform POI data is obtained by predeterminable area, and it is pre-processed;
Personal space-time data based on mobile terminal counts each region in the discrepancy number of different time, and to statistics Number is clustered;
Cluster result, pretreated POI data are parsed using DMR topic model algorithm, determine regional function Attribute information;
Personal space-time data based on mobile terminal generates the daily track data of each user, which includes Node on row chain and Trip chain;
Determine that user belongs to according to the Trip chain, each node and regional function attribute information corresponding with each node Property information.
Optionally, the regional function attribute information includes that function category division in region, different classes of function are corresponding Crucial POI type, and/or the probability distribution information on each functional category.
Optionally, map open platform POI data is obtained by region, comprising:
Reading area list;
Zone list is traversed, following process is executed:
Obtain region longitude, latitude;
Create thread pool;
Separated time journey generates the address URL of request, crawls all kinds of POI datas in each region.
Optionally, map open platform POI data is obtained by region, further includes:
Queue is created, carries out data prediction for all kinds of POI datas crawled are transferred to cloud.
Optionally, it is pre-processed using data of the cloud computing platform to acquisition, comprising: all kinds of POI datas that will be crawled Carry out coordinate conversion.
Optionally, the number of statistics is clustered, comprising: the cloud computing platform creates ODPS class, each for handling User's trip track data;
Under such, the index that time series is row is generated;
Access time range is greater than the data that initial time is less than the end time, creation statistics array;
Count each node ID and each group sum;
Reading area listing file, according to task sequence multithreading count each region respectively as enter region, leave The volume of the flow of passengers in region.
Optionally, it is pre-processed using data of the cloud computing platform to acquisition, further includes:
Array is created, for saving cluster result;
Inner peripheral flow is carried out to user's inflow and outflow amount of each node respectively and enters cluster, weekend inflow cluster, data in week Cluster outflow cluster, weekend outflow cluster.
Optionally, cluster result, pretreated POI data are parsed using DMR topic model algorithm, comprising:
The cluster(ing) file of POI data, each node is organized into predetermined format file;
Load the predetermined format file;
DMR algorithm is run, functional category division regional corresponding to each region is obtained and different classes of function is corresponding POI distributed intelligence.
Optionally, according to the Trip chain, each node and the regional function attribute information corresponded to each node Determine customer attribute information, comprising:
Estimated using HMM model according to the Space Time information of the daily track data of each user, regional function attribute information Activity of the user in each node;
Customer attribute information is determined according to the activity of user.
The present invention also provides a kind of methods for carrying out information push to user using customer attribute information determined above, should Method includes:
According to customer attribute information, a certain range of information and POI data centered on node each on track are matched, And matching result is pushed to user;And/or
According to customer attribute information, relevant news, article or books are matched, and matching result is pushed to user.
The technical solution provided in the embodiment of the present application has at least the following technical effects or advantages: through the invention, energy Enough in a small area, regional function attribute information is determined according to user's trip trajectory clustering result, further according to going out for user Row track and regional function attribute information excavate recessive user behavior information, so as to automatically determine the attribute of user Information realizes that targetedly information pushes.
The above description is only an overview of the technical scheme of the present invention, in order to better understand the technical means of the present invention, And it can be implemented in accordance with the contents of the specification, and in order to allow above and other objects of the present invention, feature and advantage can It is clearer and more comprehensible, the followings are specific embodiments of the present invention.
Detailed description of the invention
By reading the following detailed description of the preferred embodiment, various other advantages and benefits are common for this field Technical staff will become clear.The drawings are only for the purpose of illustrating a preferred embodiment, and is not considered as to the present invention Limitation.And throughout the drawings, the same reference numbers will be used to refer to the same parts.In the accompanying drawings:
Fig. 1 shows the one kind proposed according to the present invention and determines user property based on region POI data, personal space-time data The flow chart of the method for information;
Fig. 2 shows the processes that region guest flow statistics is carried out to user's space-time data of acquisition;
Fig. 3 shows the master data logic relation picture of DMR (Di Li Cray polynomial regression model) algorithm;
Fig. 4, which is shown, utilizes Di Li Cray polynomial regression model to map open platform POI data and region progress The processing result effect picture of analysis;
Fig. 5 shows the HMM model figure based on trip link.
Specific embodiment
Exemplary embodiments of the present disclosure are described in more detail below with reference to accompanying drawings.Although showing the disclosure in attached drawing Exemplary embodiment, it being understood, however, that may be realized in various forms the disclosure without should be by embodiments set forth here It is limited.On the contrary, these embodiments are provided to facilitate a more thoroughly understanding of the present invention, and can be by the scope of the present disclosure It is fully disclosed to those skilled in the art.
The development in one city, has gradually formed different functional areas, and each functional area includes multiple POI, region Functional attributes may be one, it is also possible to and it is multiple, it is determined by the classification of POI, the functional attributes in region are again to crowd's mobility There is critically important influence.
Since the functional attributes in a region mainly with the POI attribute and the close phase of crowd's flow pattern around region It closes, therefore, in order to determine the functional attributes in some region, it is usually required mainly for consider that POI data and crowd around region flow feelings Condition, in terms of reason has following two:
(1) POI data: one side POI data represents the feature around a bus station.For example, one includes most The region in mathematics school probably belongs to education sector.On the other hand, a region generally comprises various POI, therefore has There are many functions of multiplicity, and not only there was only a kind of function.Such as it is also likely to be amusement that some regions, which both may be shopping centre, Area.
(2) crowd's mobility: crowd of the regional function often with access this area has close relationship.Crowd Mobility is mainly reflected in two levels to the discovery of a regional function, one be people when reach and when Leave, the other is people wherefrom come where.Usually on weekdays, people leave residence area in the morning, return at night It returns.But in festivals or holidays or workaday evening, the place that people mainly go is recreational area.Furthermore different regional function Also the reason of flowing with crowd has relationship.For example, the crowd for reaching recreational area is likely on weekdays by under working region What class went to, it is also possible to be gone in festivals or holidays by residential quarter.Therefore, if people from functionally similar region go to other two A region, or left from similar place and go to two places, then the two regions are more likely to identity function.
The present invention attempts on the basis of regional function attribute, push away personalizedly to user in conjunction with user trajectory data It recommends.
Recommended using user trajectory data, time-space attribute, time series including considering user trajectory, position The conversion of sequentiality and moving condition has mobile object institute before and after different time in place since track data itself contains It sets, movement speed and moving direction, it is this to recommend the recommendation different from the past according to user location, because previous mode was more It is to consider that the space attribute of position also has ignored the sequentiality between space item to the shadow of recommendation even if there is GPS data It rings, recommendation results are more presented in a manner of discrete point or point set, such as only next with a POIs set expression user May interested POI, the sequence of individual consumer is not met between the POIs in set.
User trajectory is defined as a Trip chain by us: Trip chain refers to that user in one day, uses different traffic works Tool, morning from family, return again to a series of trips (trip) composition in family, at night for portraying user in city not A series of travel behaviours carried out with the time with different behavior purposes.If carried out sequentially in time with geographical location If connection, a complete broken line (polyline) can be showed in city map.And Trip chain can indicate are as follows: family → place of working → supper dining room → family, family → place of working wait different forms.
Meanwhile it is corresponding with Trip chain, user is defined as the underlying attribute of Trip chain in the activity of each node.
We pay close attention to the coordinate changed over time by user, and from equipment gps or mobile phone terminal gps positioning, correspondence is set Standby ID correspond to user of each anonymity, excavates them and uses the Trip chain (Trip Chain) of public transport daily, and by he Travel behaviour and area attribute PoI classification as hidden status information, reorganize turn as next stage regional dynamics Move the model learning parameter of attribute.
As shown in Figure 1, the present invention proposes that one kind determines customer attribute information based on region POI data, personal space-time data Method, this method comprises:
S1. map open platform POI data is obtained by predeterminable area, and it is pre-processed;
S2. the personal space-time data based on mobile terminal counts each region in the discrepancy number of different time, and to system The number of meter is clustered;
S3. cluster result, pretreated POI data are parsed using DMR topic model algorithm, determines region function It can attribute information;
S4. the personal space-time data based on mobile terminal generates the daily track data of each user, the track data packet Include the node in Trip chain and Trip chain;
S5. it determines and uses according to the Trip chain, each node and regional function attribute information corresponding with each node Family attribute information.
In step sl, map open platform is open to society opens, and the present invention is obtained by the way of being crawled by network Take POI data.
The above method executes computer program by cloud platform (server or server cluster) and completes, which can be claimed For super local platform.
As a preferred embodiment, being crawled with predeterminable area (such as 500m*500 meters) for a data cell Using a certain region as 500 meters of center of circle range of all kinds of POI datas, POI data itself is classification, there is level-one class and second level class, Each classification has the code of corresponding industry and title corresponding, and information collection is facilitated to record and distinguish.Crawling rank must be Level-one or second level, the information crawled form the sentence creeped.
Crawling POI data, detailed process is as follows: reading area list;Zone list is traversed, following process is executed: being obtained Region longitude, latitude;Create thread pool;Separated time journey generates the address URL of request, crawls using region as in the certain area of center Level-one POI data is crawled using region as the second level POI data in the certain area of center.
By taking Baidu map opening platform as an example, all coordinates in Baidu map are all Baidu's coordinates, need to convert, and are needed It calls Baidu's coordinate to convert API, then the data after coordinate conversion is parsed, are climbed one by one after parsing according to zone list Take the level-one, second level POI data of each region.
The key code sentence of specific implementation is as follows:
During crawling, queue is created, carries out data parsing for all kinds of POI datas crawled are transferred to cloud.
In the present invention, data are pre-processed preferably by Ali's cloud MaxCompute cloud computing platform, are become Area can be constructed later by DMR (Dirichlet-multinomial Regression) data that algorithm can be used directly Domain-functionalities attributive analysis (Discovers Stations of different Function, DSoF) model, is finally completed pair The functional attributes in region are analyzed, and the functional attributes information includes that function category division in region, different classes of function are corresponding Crucial POI type, and/or the probability distribution information in each function.
In step s 2, the personal space-time data based on mobile terminal counts each region in the discrepancy people of different time Number, and the number of statistics is clustered.Counting user trip track, determines group's trip information by region, due to each area There may be 5,500,000 or so personal track records in domain daily, system needs to handle one month data, therefore conduct A kind of preferred embodiment, the present invention calculate service MaxCompute, while also referred to as ODPS using the big data of Alibaba (open data process service) is a kind of quick, the TB/PB grade data warehouse solution of complete trustship. MaxCompute has provided a user perfect data import plan and a variety of distributed computing platforms, can be faster User's mass data computational problem is solved, calculating cost is effectively reduced, and ensure data safety.
As shown in Fig. 2, step S2 includes following specific steps:
S21. the cloud computing platform creates ODPS class;
S22. under such, the index that time series is row is generated;
S23. access time range is greater than the data that initial time is less than the end time, creation statistics array;
S24. statistical regions serial number and each group sum;
S25. reading area listing file, according to task sequence multithreading count each region respectively as enter region, Leave the volume of the flow of passengers in region.
Statistics for each region volume of the flow of passengers the specific implementation process is as follows:
The calculating for completing user's inflow and outflow amount in each region above, using MaxCompute platform against one month Data calculate and need or so five day time.
The statistics to the region volume of the flow of passengers, that is, the statistics of crowd's transfer mode are completed, is needed below to this mode It is clustered, the different classes of characteristic as in DMR model after using cluster.
Cluster can be divided into working day outflow, working day flows into, weekend outflow, weekend inflow, flow to the user of each node Enter discharge carry out respectively inner peripheral flow enter cluster, weekend flow into cluster, week in data clusters outflow cluster, weekend outflow cluster.
In cluster process, step S2 further include:
Array is created, for saving cluster result;
Inner peripheral flow is carried out to user's inflow and outflow amount of each node respectively and enters cluster, weekend inflow cluster, data in week Cluster outflow cluster, weekend outflow cluster.
It is the key code of the realization clustered to trip mode data below:
The data of statistics have been broadly divided into four parts, and inner peripheral flow enters some region of flow of the people, inner peripheral flow goes out a certain area The flow of the people in domain, weekend flow out some region of flow of the people, weekend flows into some region of flow of the people.Use these four types of data pair It is clustered in all with weekend.
The analysis of DMR model algorithm is carried out to next step by process above and has carried out data preparation.
Region is considered as document by DMR model, and POI data (such as restaurant and market) is considered as the word in article, crowd's Flow pattern regards metadata (such as author, keyword information) as, as a result by the functional strength distribution in a region.
In step s3, cluster result, pretreated POI data are parsed using DMR topic model algorithm, really Determine regional function attribute information, specifically include:
The cluster(ing) file of POI data, each region is organized into predetermined format file;
Load the predetermined format file;
DMR algorithm is run, functional category division regional corresponding to each region is obtained and different classes of function is corresponding POI distributed intelligence.
Specific analytic process is as follows:
Characteristic during key procedure is realized above just refers to different classes of after clustering.
In the DMR model, regard region as an article, the functional attributes in region see the different themes made an issue of, i.e., Region with multiple functional attributes is the same just as the article comprising varied theme.Specifically, around a region POI data can regard the text for forming this article as, the cluster result of passenger flow vector can analogize to a piece of article The characteristic attributes such as keyword, author information, the specific category of entire POI can be considered as all texts in corpus.
Topic model algorithm based on DMR can add the feature that theme has an impact flexibly by addition to text Enter and is calculated into model, compared with other combine the model of specific characteristic, such as Author-Topic model, and Topic-Over-Time model (one of supervisied-LDA model family) is compared, and DMR has more flexible spy Sign is chosen, and calculates more succinct efficient.
It is as shown in Figure 3 for the organizational logical structure of data in DMR model, wherein N is using σ as the Gauss of hyper parameter point Cloth, λkIt is that there is vector identical with passenger flow data cluster result length, the classification of the n-th class POI observed in the r of region indicates For mr,n.Other symbolic interpretations are identical as the LDA model being mentioned above, and EM algorithm and gibbs also can be used in DMR model Sampling algorithm is calculated.Unlike LDA model, herein, for the Different Results that the volume of the flow of passengers clusters, Mei Gequ The Di Li Cray in domain is distributed αrIt is different [33].Therefore the area topic distribution being calculated in DMR model is by POI attribute Cause jointly with crowd's Move Mode, functional category division and difference regional corresponding to each region will eventually be obtained The corresponding POI category distribution of classification function.
The result that obtains according to DMR model analyzing above is as shown in figure 4, wherein the first row data: representing different classes of Functional attributes as a result, giving in such, the ratio of the most heavy functional attributes of accounting.Actually distinct functional attributes with And its show that its corresponding region can be distinguished by different colors on map.Lower section is suitable according to what is successively successively decreased simultaneously Sequence lists the keyword and word frequency of concrete function in every one kind.In conjunction with resident on weekdays, the average travel on day off when Between traffic characteristic, can to the result of algorithm carry out it is explained below:
Topic1: enterprise and garden;Topic2: city neighborhood;Topic3: scientific research and education;Topic4: down town quotient Industry area;Topic5: scenic spot;Topic6: city Office Area;Topic7: transport hub and public institution.
Through the invention, topic analysis can be carried out to city region-by-region, so that it is determined that each region different functionalities attribute, with And main POI under each attribute, data basis is provided to the analysis of user behavior for next step.
Hidden Markov model is commonly used in the application of the research of prediction based on time series, especially speech recognition (Hidden Markov Model-HMM) is corresponding by the phoneme signal of observation state and the corresponding vowel of hidden state, using general The state transfer relationship of rate obtains the Statistic analysis models of training data study, to reach the signal of continuous time series Identification.The present invention proposes that the trip mode of mobile subscriber is an approximate hidden Markov model (HMM), and most suitable Share HMM modeling, the definition of the observation state of comparison model and hidden state, it is found that we observe is user one The corresponding observation state sequence (location status) of the hinged node of Trip chain and Trip chain in it, still, as because of state User behavior, which type of behavior user produce in the position of the hinged node of Trip chain on earth, is unknown, i.e. model Hidden status switch.We have seen that HMM model is established in similar having using mobile phone signaling data and taxi car data, predict User behavior pattern and the corresponding relationship of trip mode produce good prediction effect to inhomogeneous user grouping.We That be more concerned about is the corresponding observation state (observable of hinged node in the time series shifted as the stochastic regime of HMM States) shift with hidden state (hidden states) transfer correlation, i.e., the time series of Urban Residential Trip chain and Influence of the behavior sequence to region.
Based on above-mentioned discovery, it is proposed that the HMM model based on Trip chain Trip-Chains, as shown in Figure 5:
Hidden state Hidden State:S={ r1, r2, r3, r4 }, N=4;Observation state Observation State:O ={ c1, c2, c3, c4, c5, c6 }, M=6;A is state transition probability, and B is observation probability observation probabilities。
We are by urban grid, and with 500 meters of x500 meters of progress city zonings, modeling is simplified to a small area Problem:
Problem models Problem Modelling
In order to describe trip track, HMM mesh modeling is carried out, as shown in figure 5, a city dweller is in interregional (grid Side) it is mobile, one or more snippets side forms Trip-Chain.For example, as the t in the time, city dweller be located at diamond shape node (1, 2), in subsequent time t+1, can in any node of other on broken line (1,0), (1,1), (1,3) or node (2,3), (2, 4), city dweller moves between the node of different broken lines according to transition probability.If it is same to be located at a plurality of different broken lines in time t When the node that passes through, then the node of any bar broken line can be moved in subsequent time t+1.
User is determined according to the Trip chain, each node and the regional function attribute information corresponded to each node Attribute information, comprising:
Estimated using HMM model according to the Space Time information of the daily track data of each user, regional function attribute information Activity of the user in each node;
Customer attribute information is determined according to the activity of user.
Shown in Fig. 5, in the model, n is definedtFor node state variable in observation state variable, that is, grid.It is fixed Adopted atFor activity state variable in hidden state variable, that is, grid.
Transition probability defines Transition probability:a={ aij}={ P (nt+1=(1,2) | nt=(0,2)) }
Probability defines Initial probability: π={ πi}={ P (n1=(2,2)) }
Emission probability defines Emission Probabilities:
γt(i)=P (at=(tag1) | nt=(0,2))
γt(i)=P (at=(tag2) | nt=(0,2))
γt(i)=P (at=(tag3) | nt=(0,2))
It, can easily posteriorly directed force indicates forward according to the definition of forward backward algorithm are as follows:
Given observation sequence N and hidden Markov model λ defines t moment and is located at hidden state aiAnd the t+1 moment is located at hidden shape State ajProbability variable are as follows:
ξt(i, j)=P (qt=ai,qt+1=aj|n,λ)
When position of the user between different moments, different location is shifted, future can be predicted by current location The possible position of user and behavior, and appropriate adjustment is carried out to push content.
In step s 5, believed according to the Trip chain, each node and regional function attribute corresponding with each node It ceases and determines customer attribute information, comprising:
A) estimated using HMM model according to the Space Time information of the daily track data of each user, regional function attribute information User is surveyed in the activity of each node;
B) customer attribute information is determined according to the activity of user.
User property, alternatively referred to as audience attributes are determined based on the space time information of user trajectory, behavior, with specific reference to user The living habit etc. that place, their personal consumption behavior, the having time position gone characterize carries out classification label, so that it is determined that The audience attributes of user can be marked by the classification of multiple dimensions.Wherein specifically include: the GPS coordinate according to locating for user, It inquires the attribute of its present position, including POI title, locating belongs to which function class which region and present position belong to Not;User property (audience attributes) are determined according to the functional attributes of user described in a period of time present position.
As a kind of specific embodiment, the certain user of airport favourable turn terminal is frequently appeared in, will appear 1 weekly It is secondary, it will appear the business people etc. of 2 times or flying trapeze weekly, device location when according to their used mobile phones is turning The time and frequency that machine terminal occurs, their user property is defined as travelling merchants' white collar to super local platform or unit works The attributes such as personnel.In this way, user leaves airport even if route terminates, super local platform can still pay close attention to user, according to their category Property, matched businessman's market content, travel for commercial purpose market content are pushed to them.It therefore, can be according to the user of acquisition The attribute information of position region, the User Activity frequency determine user property.
According to customer attribute information, a certain range of information and POI data centered on node each on track are matched, And matching result is pushed to user;And/or
According to customer attribute information, relevant news, article or books are matched, and matching result is pushed to user.
Firstly, periodically request is sent to the smart phone of user or other mobile terminals, to obtain user's GPS coordinate, correct time;It is right to obtain the institute of that region belonging to the GPS coordinate for the GPS coordinate according to locating for user Answer the functional attributes information in region;According to local dealer, retailer that user property is adapted for its matching, by local Dealer, retailer provide it different recommendation and favor information according to audience attributes.
Super local platform can determine whether which product or businessman in which function of periphery according to the spatio-temporal distribution of user Sales volume in energy region is big, to analyze the influence etc. of crowd and regional function to sales volume, provides data branch for third party businessman It holds.For super local platform user, this process be it is hiding, do not need user and carry out any operation, but user is, it can be seen that look for The businessman liked to oneself can be more convenient, also can readily see that the product for meeting oneself taste appears in recommendation, is easy Obtain proficiency favor information.
The technical solution provided in the embodiment of the present application has at least the following technical effects or advantages: through the invention, energy It is enough associated according to user data of the realization in a regional areas and the place with certain function attribute, to complete essence Standard is recommended.
In the instructions provided here, numerous specific details are set forth.It is to be appreciated, however, that implementation of the invention Example can be practiced without these specific details.In some instances, well known method, structure is not been shown in detail And technology, so as not to obscure the understanding of this specification.
Similarly, it should be understood that in order to simplify the disclosure and help to understand one or more of the various inventive aspects, In Above in the description of exemplary embodiment of the present invention, each feature of the invention is grouped together into single implementation sometimes In example, figure or descriptions thereof.However, the disclosed method should not be interpreted as reflecting the following intention: i.e. required to protect Shield the present invention claims features more more than feature expressly recited in each claim.More precisely, as following Claims reflect as, inventive aspect is all features less than single embodiment disclosed above.Therefore, Thus the claims for following specific embodiment are expressly incorporated in the specific embodiment, wherein each claim itself All as a separate embodiment of the present invention.
It should be noted that the above-mentioned embodiments illustrate rather than limit the invention, and ability Field technique personnel can be designed alternative embodiment without departing from the scope of the appended claims.In the claims, Any reference symbol between parentheses should not be configured to limitations on claims.

Claims (10)

1. a kind of method for determining customer attribute information based on region POI data, personal space-time data, which is characterized in that the party Method includes:
Map open platform POI data is obtained by predeterminable area, and it is pre-processed;
Location data based on mobile terminal acquisition counts each region in the discrepancy number of different time, and to the number of statistics It is clustered;
Cluster result, pretreated POI data are parsed using DMR topic model algorithm, determine regional function attribute Information;
Personal space-time data based on mobile terminal generates the daily track data of each user, which includes Trip chain And the node in Trip chain;
Determine that user property is believed according to the Trip chain, each node and regional function attribute information corresponding with each node Breath.
2. according to the method described in claim 1, it is further characterized in that, the regional function attribute information includes function in region The corresponding key POI type of category division, different classes of function, and/or the probability distribution information on each functional category.
3. according to the method described in claim 1, it is further characterized in that, by region obtain map open platform POI data, packet It includes:
Reading area list;
Zone list is traversed, following process is executed:
Obtain region longitude, latitude;
Create thread pool;
Separated time journey generates the address URL of request, crawls all kinds of POI datas in each region.
4. according to the method described in claim 1, it is further characterized in that, by region obtain map open platform POI data, also wrap It includes:
Queue is created, carries out data prediction for all kinds of POI datas crawled are transferred to cloud.
5. according to the method described in claim 1, it is further characterized in that, located in advance using data of the cloud computing platform to acquisition Reason, comprising: all kinds of POI datas crawled are subjected to coordinate conversion.
6. according to the method described in claim 1, it is further characterized in that, the number of statistics is clustered, comprising: the cloud meter It calculates platform and creates ODPS class, for handling each user's trip track data;
Under such, the index that time series is row is generated;
Access time range is greater than the data that initial time is less than the end time, creation statistics array;
Count each node ID and each group sum;
Reading area listing file, according to task sequence multithreading count each region respectively as enter region, leave region The volume of the flow of passengers.
7. method according to claim 1-6, it is further characterized in that, using cloud computing platform to the data of acquisition It is pre-processed, further includes:
Array is created, for saving cluster result;
Inner peripheral flow is carried out to user's inflow and outflow amount of each node respectively and enters cluster, weekend inflow cluster, data clusters in week Outflow cluster, weekend outflow cluster.
8. according to the method described in claim 7, it is further characterized in that, to cluster result, pretreated POI data utilize DMR topic model algorithm is parsed, comprising:
The cluster(ing) file of POI data, each node is organized into predetermined format file;
Load the predetermined format file;
DMR algorithm is run, functional category division and the corresponding POI of different classes of function regional corresponding to each region is obtained Distributed intelligence.
9. one kind be based on method described in claim 1, which is characterized in that according to the Trip chain, each node and with it is described The regional function attribute information that each node corresponds to determines customer attribute information, comprising:
Space Time information, regional function attribute information estimating subscriber's using HMM model according to the daily track data of each user In the activity of each node;
Customer attribute information is determined according to the activity of user.
10. a kind of method that the customer attribute information determined based on claim 1-9 carries out information push to user, this method packet It includes:
According to customer attribute information, a certain range of information and POI data centered on node each on track are matched, and will Matching result is pushed to user;And/or
According to customer attribute information, relevant news, article or books are matched, and matching result is pushed to user.
CN201910611619.9A 2019-07-08 2019-07-08 Method for determining user attribute information and information push method Active CN110442662B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910611619.9A CN110442662B (en) 2019-07-08 2019-07-08 Method for determining user attribute information and information push method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910611619.9A CN110442662B (en) 2019-07-08 2019-07-08 Method for determining user attribute information and information push method

Publications (2)

Publication Number Publication Date
CN110442662A true CN110442662A (en) 2019-11-12
CN110442662B CN110442662B (en) 2022-05-20

Family

ID=68429852

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910611619.9A Active CN110442662B (en) 2019-07-08 2019-07-08 Method for determining user attribute information and information push method

Country Status (1)

Country Link
CN (1) CN110442662B (en)

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111159583A (en) * 2019-12-31 2020-05-15 中国联合网络通信集团有限公司 User behavior analysis method, device, equipment and storage medium
CN111460301A (en) * 2020-03-31 2020-07-28 拉扎斯网络科技(上海)有限公司 Object pushing method and device, electronic equipment and storage medium
CN112632370A (en) * 2020-12-08 2021-04-09 青岛海尔科技有限公司 Method, device and equipment for article pushing
CN113127594A (en) * 2021-06-17 2021-07-16 脉策(上海)智能科技有限公司 Method, computing device and storage medium for determining grouping data of geographic area
CN113177058A (en) * 2021-05-11 2021-07-27 北京邮电大学 Geographic position information retrieval method and system based on composite condition
WO2021164131A1 (en) * 2020-02-20 2021-08-26 深圳壹账通智能科技有限公司 Map display method and system, computer device and storage medium
CN113569978A (en) * 2021-08-05 2021-10-29 北京红山信息科技研究院有限公司 Travel track identification method and device, computer equipment and storage medium
CN114626340A (en) * 2022-03-17 2022-06-14 智慧足迹数据科技有限公司 Behavior feature extraction method based on mobile phone signaling and related device

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105183870A (en) * 2015-09-17 2015-12-23 武汉大学 Urban functional domain detection method and system by means of microblog position information
CN106951828A (en) * 2017-02-22 2017-07-14 清华大学 A kind of recognition methods of the urban area functional attributes based on satellite image and network
CN108108844A (en) * 2017-12-25 2018-06-01 儒安科技有限公司 A kind of urban human method for predicting and system
CN108876475A (en) * 2018-07-12 2018-11-23 青岛理工大学 A kind of urban function region recognition methods, server and storage medium based on point of interest acquisition
CN108875032A (en) * 2018-06-25 2018-11-23 讯飞智元信息科技有限公司 Area type determines method and device
CN109029446A (en) * 2018-06-22 2018-12-18 北京邮电大学 A kind of pedestrian position prediction technique, device and equipment

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105183870A (en) * 2015-09-17 2015-12-23 武汉大学 Urban functional domain detection method and system by means of microblog position information
CN106951828A (en) * 2017-02-22 2017-07-14 清华大学 A kind of recognition methods of the urban area functional attributes based on satellite image and network
CN108108844A (en) * 2017-12-25 2018-06-01 儒安科技有限公司 A kind of urban human method for predicting and system
CN109029446A (en) * 2018-06-22 2018-12-18 北京邮电大学 A kind of pedestrian position prediction technique, device and equipment
CN108875032A (en) * 2018-06-25 2018-11-23 讯飞智元信息科技有限公司 Area type determines method and device
CN108876475A (en) * 2018-07-12 2018-11-23 青岛理工大学 A kind of urban function region recognition methods, server and storage medium based on point of interest acquisition

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
于璐等: "基于时空语义挖掘的城市功能区识别研究", 《四川大学学报(自然科学版)》 *
刘丽娴等: "基于数据挖掘的移动用户出行轨迹预测", 《移动通信》 *

Cited By (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111159583B (en) * 2019-12-31 2023-08-04 中国联合网络通信集团有限公司 User behavior analysis method, device, equipment and storage medium
CN111159583A (en) * 2019-12-31 2020-05-15 中国联合网络通信集团有限公司 User behavior analysis method, device, equipment and storage medium
WO2021164131A1 (en) * 2020-02-20 2021-08-26 深圳壹账通智能科技有限公司 Map display method and system, computer device and storage medium
CN111460301A (en) * 2020-03-31 2020-07-28 拉扎斯网络科技(上海)有限公司 Object pushing method and device, electronic equipment and storage medium
CN111460301B (en) * 2020-03-31 2024-01-26 拉扎斯网络科技(上海)有限公司 Object pushing method and device, electronic equipment and storage medium
CN112632370A (en) * 2020-12-08 2021-04-09 青岛海尔科技有限公司 Method, device and equipment for article pushing
CN113177058A (en) * 2021-05-11 2021-07-27 北京邮电大学 Geographic position information retrieval method and system based on composite condition
CN113177058B (en) * 2021-05-11 2023-10-13 北京邮电大学 Geographic position information retrieval method and system based on composite condition
CN113127594B (en) * 2021-06-17 2021-09-03 脉策(上海)智能科技有限公司 Method, computing device and storage medium for determining grouping data of geographic area
CN113127594A (en) * 2021-06-17 2021-07-16 脉策(上海)智能科技有限公司 Method, computing device and storage medium for determining grouping data of geographic area
CN113569978A (en) * 2021-08-05 2021-10-29 北京红山信息科技研究院有限公司 Travel track identification method and device, computer equipment and storage medium
CN114626340B (en) * 2022-03-17 2023-02-03 智慧足迹数据科技有限公司 Behavior feature extraction method based on mobile phone signaling and related device
CN114626340A (en) * 2022-03-17 2022-06-14 智慧足迹数据科技有限公司 Behavior feature extraction method based on mobile phone signaling and related device

Also Published As

Publication number Publication date
CN110442662B (en) 2022-05-20

Similar Documents

Publication Publication Date Title
CN110442662A (en) A kind of method and information-pushing method of determining customer attribute information
Bao et al. Exploring bikesharing travel patterns and trip purposes using smart card data and online point of interests
Ermagun et al. Real-time trip purpose prediction using online location-based search and discovery services
Zhong et al. Inferring building functions from a probabilistic model using public transportation data
Bhat An endogenous segmentation mode choice model with an application to intercity travel
Chan et al. A station-level ridership model for the metro network in Montreal, Quebec
US20160125307A1 (en) Air quality inference using multiple data sources
Nguyen et al. Reviewing trip purpose imputation in GPS-based travel surveys
Zheng Urban computing
Chang et al. Understanding user’s travel behavior and city region functions from station-free shared bike usage data
Zheng et al. Chinese tourists in Nordic countries: An analysis of spatio-temporal behavior using geo-located travel blog data
Moiseeva et al. Imputing relevant information from multi-day GPS tracers for retail planning and management using data fusion and context-sensitive learning
Gong et al. Extracting activity patterns from taxi trajectory data: A two-layer framework using spatio-temporal clustering, Bayesian probability and Monte Carlo simulation
CN103593349A (en) Movement position analysis method in sense network environment
CN114897444B (en) Method and system for identifying service facility requirements in urban subarea
Cliquet et al. Location-based marketing: geomarketing and geolocation
US20210011920A1 (en) Architecture for data analysis of geographic data and associated context data
McKenzie et al. Measuring urban regional similarity through mobility signatures
Zhang et al. Understanding user economic behavior in the city using large-scale geotagged and crowdsourced data
Gong et al. Agent-based modelling with geographically weighted calibration for intra-urban activities simulation using taxi GPS trajectories
Chen et al. Harnessing social media to understand tourist travel patterns in muti-destinations
Gong et al. Geographical and temporal huff model calibration using taxi trajectory data
Chaudhuri et al. Application of web-based Geographical Information System (GIS) in tourism development
Guo et al. Fine-grained dynamic price prediction in ride-on-demand services: Models and evaluations
Wang et al. Competitive location selection of a commercial center based on the vitality of commercial districts and residential emotion

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant