CN112989143A - Guest group insights method based on geohash address coding - Google Patents
Guest group insights method based on geohash address coding Download PDFInfo
- Publication number
- CN112989143A CN112989143A CN202110388677.7A CN202110388677A CN112989143A CN 112989143 A CN112989143 A CN 112989143A CN 202110388677 A CN202110388677 A CN 202110388677A CN 112989143 A CN112989143 A CN 112989143A
- Authority
- CN
- China
- Prior art keywords
- geohash
- user
- clustering
- address coding
- guest group
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/903—Querying
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/906—Clustering; Classification
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F21/00—Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
- G06F21/60—Protecting data
- G06F21/602—Providing cryptographic facilities or services
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Databases & Information Systems (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- Health & Medical Sciences (AREA)
- Bioethics (AREA)
- General Health & Medical Sciences (AREA)
- Computer Hardware Design (AREA)
- Computer Security & Cryptography (AREA)
- Software Systems (AREA)
- Computational Linguistics (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
Abstract
The invention discloses a guest group insights method based on geohash address coding, which relates to the field of user portraits and comprises the following steps: step 1, data acquisition and pretreatment; step 2, aiming at the selected business area, converting the polygonal business area into a geohash list; step 3, inquiring users visited within the specified time, and then performing duplicate removal processing on the users; step 4, inquiring a user source geohash and a user portrait of the user; and 5, clustering the passenger groups according to the density to obtain the main passenger source places and the corresponding user portraits. The method has the advantages of low time complexity of calculation and strong expandability.
Description
Technical Field
The invention relates to the technical field of user portraits, in particular to a guest group insight method based on geohash address coding.
Background
Under the influence of the improvement of the consumption level of Chinese residents, the diversification of the transportation and travel modes and the continuous flourishing and development of the tourism industry, the business circles of the Chinese cities develop rapidly. Particularly, urban business circles in a first line are densely distributed, the number of core business circles is large, businesses are densely paved in the business circle coverage range, and the transportation is convenient. At present, the commercial circle of the Chinese city experiences the period of traditional department and shopping center, and now enters the period of large-scale commercial complex, so that a multi-center, multi-level and networked commercial circle distribution pattern is formed. However, at present, the urban business circles in China still have the problems of serious business state homogenization, low business circle attraction, low stickiness of business circle customers and the like, and in the future, transformation and upgrading, consumer demand focusing on consumer differentiation, individuation and fashion, construction of 'intelligent business circles' and the like are important directions for urban business circle development.
The customer base insights can be used for carrying out multi-dimensional analysis on the crowds in a specific area to generate an all-dimensional customer base portrait and a profound professional analysis report, so that a scientific and rigorous data basis is provided for scene, fine marketing and operation, the most powerful data support is provided for business decision, management decision and operation planning of customers, and a new business circle operation mode that market activity information is distributed to surrounding resident consumer crowds in real time and accurately reaches is realized.
The traditional method needs to calculate the position relationship between the Point (Point) of the user and the Polygon (Polygon) in the business district, and then analyzes the user who has arrived at the business district, and when the information of the residence and the work place is related, the position relationship between the Point (Point) of the user and the Polygon (Polygon) in the residential district needs to be calculated. In this case, it is very complicated to calculate the positional relationship between the polygon and the point, and it takes a lot of time. In an actual scene, the data volume of the report points and the data volume of the polygons are very large, and the time complexity of calculation one by one is unacceptable. In a practical scenario, even if a distributed computing approach is used, the complexity described above is still intolerable.
When information of a residence and a working place is related, polygonal information of a residence area and a working area is difficult to count and obtain, and when the analysis of a customer group is expanded to other cities and areas, the information of the residence and the working area of the corresponding city and area needs to be collected, so that the system is very inconvenient.
Therefore, for the problems existing in the conventional method, a manner based on the geohash is used by those skilled in the art to quickly calculate the number of report points in the polygonal area of the business district, and the actual polygonal shapes of the residential place and the work place are not concerned, but the geohash is used to represent the work place and the residential place of the user, and then the final results are aggregated.
Disclosure of Invention
In view of the above-mentioned drawbacks of the prior art, the technical problem to be solved by the present invention is how to calculate the number of pixels in the polygonal area of the business turn with low time complexity.
In order to achieve the above object, the present invention provides a guest group insights method based on geohash address coding, which comprises the following steps:
step 1, data acquisition and pretreatment;
step 2, aiming at the selected business area, converting the polygonal business area into a geohash list;
step 3, inquiring users visited within the specified time, and then performing duplicate removal processing on the users;
step 4, inquiring the geohash and portrait characteristics of the customer source of the user;
and 5, clustering the passenger groups according to the density to obtain the passenger source places and the corresponding portrait features.
Further, the pre-treatment of step 1 is divided into two parts: user residence, workplace information preprocessing and user visit information preprocessing.
Further, the user residence and workplace information preprocessing comprises the following steps:
step a, acquiring report point data of all users;
step b, for the same user with the same user id, judging the working place geohash and the residential place geohash according to the place, the time period and the duration where the point reporting data stays;
and c, carrying out confusion encryption on the user id, and storing the user confusion id, the residence place, the working place and the image information.
Further, in an actual scene, the geohash needs to be determined according to the situation of an expected polygon; when looking at the level of a city, the precision of the geohash partitioning takes 6 bits (1.2km 609.4m) or 7 bits (152.9m 152.4m) if a larger polygon result is desired; when looking at the level of a certain quotient circle, the precision of the geohash partition takes 8 bits (38.2m 19m) or 9 bits (4.8 m) if a detailed result of the peripheral region is desired; the user id can be any field which can uniquely identify the user, such as a mobile phone number, an identity card number, a software user number and the like, and the original user id data is stored after being irreversibly encrypted.
Further, the user visit information preprocessing comprises the following steps:
step d, acquiring the report point data of all users in one day;
e, dividing according to the geohash, and removing duplication of the report point data of the same user id in the same geohash;
and f, carrying out obfuscation encryption on the user id, and storing the obfuscated id of the visiting user according to the geohash.
Further, the step 2 further comprises the following steps:
step 2.1, acquiring the geojson of a business district;
and 2.2, converting the geojson into the geohash list.
Further, the calculation method in the step 2.2 has two types, one is to count the value corresponding to the geohash only when a certain geohash is completely within the polygon area, and the other is to count the corresponding geohash mesh only when a part of the geohash is covered by the polygon area.
Further, the step 4 is specifically to obtain the working place geohash, the residential place geohash and the portrait feature corresponding to the user id according to the user id obtained by the deduplication in the step 3 and the user information preprocessed by the user visiting information.
Further, the step 5 comprises the following steps:
step 5.1, adding corresponding random errors to the residence and the working places of the user according to the precision of the selection of the geohash, and restoring the geohash into the report point data;
step 5.2, circularly calculating optics clustering by changing the minimum number of people in each clustering, and calculating a final clustering result according to different reachable distances;
step 5.3, when the obtained clustering number is within the desired clustering number interval, stopping calculating, and returning the clustering result;
and 5.4, calculating the image characteristics of the corresponding passenger groups according to the user id in each passenger source region cluster in the clustering result.
Further, the random error is a random number obtained by multiplying the maximum error by (-1,1), and the random error is added to the longitude and latitude of the geohash central point position, so that the recovery of the report point data is completed; the minimum number of people and the reachable distance are preset algorithm parameters.
Compared with the prior art, the invention has the following beneficial effects:
1. the method has the advantages that the number of report points in the polygonal area of the business circle is accelerated to be calculated by utilizing a mode of preprocessing report point data based on the geohash grid, the relation between all report points and the polygonal area of the business circle is avoided to be calculated every time, and the calculation time is saved.
2. And the real polygon shape of the work place, the residence place instead of the residence place and the work place of the user is represented by the geohash, and the expansion and the migration of the application can be realized without real region attribute information only by aggregating final results.
The conception, the specific structure and the technical effects of the present invention will be further described with reference to the accompanying drawings to fully understand the objects, the features and the effects of the present invention.
Drawings
FIG. 1 is a flowchart of the overall solution of a preferred embodiment of the present invention;
FIG. 2 is a flow chart of data preprocessing according to a preferred embodiment of the present invention.
Detailed Description
The present invention will be described in detail with reference to specific examples. The following examples will assist those skilled in the art in further understanding the invention, but are not intended to limit the invention in any way. It should be noted that it would be obvious to those skilled in the art that various changes and modifications can be made without departing from the spirit of the invention. All falling within the scope of the present invention.
It should be noted that the embodiments and features of the embodiments may be combined with each other without conflict.
The invention is further described with reference to the following drawings and specific examples, which are not intended to be limiting.
The invention realizes the customer group analysis of the business district based on the geohash. The flow of the guest group analysis in the scheme is shown in figure 1. The passenger group analysis implementation idea:
1) data acquisition and preprocessing;
2) aiming at the selected business circle area, converting the polygonal business circle area into a geohash list;
3) inquiring the visited users within the appointed time (the visited users of each geohash have been pre-calculated), and then carrying out duplicate removal processing on the users;
4) querying a user's geo-hash of a customer source and a user representation (each user's place of work, place of residence, geo-hash has been pre-computed);
5) and clustering the passenger groups according to the density to obtain main passenger source places and corresponding user group images.
Each step will be described in detail below.
Step 1, preprocessing data is divided into two parts: user occupancy, workplace information preprocessing (see left half of fig. 2), and user visit information preprocessing (see right half of fig. 2).
Preprocessing the information of the user residence and the work place:
1) acquiring point reporting data of all users;
2) for the same user with the same id, judging the geohash of the working place and the residential place of the same user according to the place, the time period and the duration where the user reports the place to stay;
3) and carrying out obfuscation encryption on the id, and storing the user obfuscation id, the residence place, the working place and the image information.
Some of these concerns are how to select the number of bits of the geohash, and the size of the grid represented by the geohash with different numbers of bits is different. Theoretically, the smaller the granularity of the geohash grid is, the better the granularity is, so that the corresponding precision is improved, and the result is more accurate. However, in an actual scenario, it needs to be determined according to the situation of the desired polygon. If the polygon area of the final result is large, then using a small geohash mesh partition will produce a large number of intermediate results, increasing the amount of computation. When looking at the level of a city, the accuracy of the geohash partition is preferably 6 bits (1.2km x 609.4m) or 7 bits (152.9m x 152.4m) if larger polygon results are desired (e.g. whole cell, factory, commercial). When looking at the level of a certain business circle, if a detailed result of the surrounding area is desired (e.g., specific to a building), the precision of the geohash partition is preferably 8 bits (38.2m 19m) or 9 bits (4.8 m).
In addition, the user id can be any field which can uniquely identify the user, such as a mobile phone number, an identity card number, a software user number and the like, and in order to prevent user information from being leaked, original id data is stored after being irreversibly encrypted.
Preprocessing user visit information:
1) acquiring reporting point data of all users in one day;
2) dividing according to the geohash, and removing duplication of report points with the same id in the same geohash;
3) and carrying out obfuscation encryption on the id, and storing the obfuscated id of the access user according to the geohash.
Step 2, converting the business district into a geohash:
1) acquiring geojson of a business district;
2) the geojson is converted to a geohash list.
It should be noted here that there are two methods for computing the geohash, one is to count the value of the corresponding geohash only when a certain geohash is completely within the polygon region, and the other is to count the corresponding geohash mesh only when a part of the geohash is covered by the polygon region, and the geohash mesh can be selected according to actual requirements.
Step 3, user query and duplicate removal:
1) inquiring the visiting user of the geohash list corresponding to the quotient circle in the appointed time range;
2) and carrying out deduplication processing on the user id.
Step 4, inquiring the client source and the portrait:
and inquiring the preprocessed user information according to the user id obtained by the duplicate removal, and obtaining the working place, the residential place geohash and the portrait characteristics corresponding to the id.
And 5, carrying out density-based clustering on the passenger source:
and finally, clustering the obtained business circle visiting users to obtain the information of the customer source and the group image, wherein the specific method comprises the following steps:
1) adding a corresponding random error to the residence/working place of the user according to the precision of the selected geohash, and restoring the geohash into report point data;
2) circularly calculating optics clustering by changing the minimum number of people in each clustering, and calculating a final clustering result according to different reachable distances;
3) when the obtained clustering number is within the desired clustering number interval, stopping calculating and returning clustering results;
4) and calculating the corresponding passenger group feature portrait according to the id in each passenger source cluster in the clustering result.
Regarding the random error when the geohash is restored into the report point, the geohashes with different precisions correspond to different maximum errors, the maximum error is multiplied by a random number between (-1,1) to be used as the random error, and the random errors are respectively added to the longitude and latitude of the central point position of the geohash, so that the restoration of the report point data is completed.
The minimum number of people in the clustering area and the reachable distance are preset algorithm parameters and need to be properly adjusted according to actual conditions.
The foregoing detailed description of the preferred embodiments of the invention has been presented. It should be understood that numerous modifications and variations could be devised by those skilled in the art in light of the present teachings without departing from the inventive concepts. Therefore, the technical solutions available to those skilled in the art through logic analysis, reasoning and limited experiments based on the prior art according to the concept of the present invention should be within the scope of protection defined by the claims.
Claims (10)
1. A guest group insights method based on geohash address coding is characterized by comprising the following steps:
step 1, data acquisition and pretreatment;
step 2, aiming at the selected business area, converting the polygonal business area into a geohash list;
step 3, inquiring users visited within the specified time, and then performing duplicate removal processing on the users;
step 4, inquiring the geohash and portrait characteristics of the customer source of the user;
and 5, clustering the passenger groups according to the density to obtain the passenger source places and the corresponding portrait features.
2. The method for providing guest group insight based on geohash address coding according to claim 1, wherein the preprocessing of the step 1 is divided into two parts: user residence, workplace information preprocessing and user visit information preprocessing.
3. The method for providing guest group insights based on geohash address coding as claimed in claim 2, wherein the preprocessing of the user residential and workplace information comprises the steps of:
step a, acquiring report point data of all users;
step b, for the same user with the same user id, judging the working place geohash and the residential place geohash according to the place, the time period and the duration where the point reporting data stays;
and c, carrying out confusion encryption on the user id, and storing the user confusion id, the residence place, the working place and the image information.
4. The guest group insight method based on geohash address coding of claim 3, wherein in an actual scene, the geohash needs to be determined according to the situation of an expected polygon; when looking at the level of a city, the precision of the geohash partitioning takes 6 bits (1.2km 609.4m) or 7 bits (152.9m 152.4m) if a larger polygon result is desired; when looking at the level of a certain quotient circle, the precision of the geohash partition takes 8 bits (38.2m 19m) or 9 bits (4.8 m) if a detailed result of the peripheral region is desired; the user id can be any field which can uniquely identify the user, such as a mobile phone number, an identity card number, a software user number and the like, and the original user id data is stored after being irreversibly encrypted.
5. The method of claim 4, wherein the pre-processing of the user visit information comprises the steps of:
step d, acquiring the report point data of all users in one day;
e, dividing according to the geohash, and removing duplication of the report point data of the same user id in the same geohash;
and f, carrying out obfuscation encryption on the user id, and storing the obfuscated id of the visiting user according to the geohash.
6. The method for providing guest group insight based on geohash address coding according to claim 5, wherein the step 2 further comprises the steps of:
step 2.1, obtaining geojson of a business district;
and 2.2, converting the geojson into the geohash list.
7. The method for providing insight to a guest group based on geohash address coding as claimed in claim 6, wherein the calculation method of step 2.2 includes two methods, one is to count the value of the geohash only when the geohash is completely within the polygon area, and the other is to count the corresponding geohash mesh only when part of the geohash is covered by the polygon area.
8. The method for providing customer group insight based on geohash address coding as claimed in claim 7, wherein said step 4 is specifically configured to obtain the working place geohash, the residential place geohash and the portrait features corresponding to the user id according to the user id obtained by the deduplication in said step 3 and the user information preprocessed by the user visiting information.
9. The method for providing guest group insight based on geohash address coding according to claim 8, wherein the step 5 comprises the steps of:
step 5.1, adding corresponding random errors to the residence and the working places of the user according to the precision of the selection of the geohash, and restoring the geohash into the report point data;
step 5.2, circularly calculating optics clustering by changing the minimum number of people in each clustering, and calculating a final clustering result according to different reachable distances;
step 5.3, when the obtained clustering number is within the desired clustering number interval, stopping calculating, and returning the clustering result;
and 5.4, calculating the image characteristics of the corresponding passenger groups according to the user id in each passenger source region cluster in the clustering result.
10. The method for providing insight to a guest group based on a geohash address code according to claim 9, wherein the random error is a random number obtained by multiplying a maximum error by (-1,1), and the recovery of the hit data is completed by adding the random error to the longitude and latitude of the geohash center point; the minimum number of people and the reachable distance are preset algorithm parameters.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110388677.7A CN112989143A (en) | 2021-04-12 | 2021-04-12 | Guest group insights method based on geohash address coding |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110388677.7A CN112989143A (en) | 2021-04-12 | 2021-04-12 | Guest group insights method based on geohash address coding |
Publications (1)
Publication Number | Publication Date |
---|---|
CN112989143A true CN112989143A (en) | 2021-06-18 |
Family
ID=76337864
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202110388677.7A Pending CN112989143A (en) | 2021-04-12 | 2021-04-12 | Guest group insights method based on geohash address coding |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN112989143A (en) |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103383682A (en) * | 2012-05-01 | 2013-11-06 | 刘龙 | Geographic coding method, and position inquiring system and method |
US20160321351A1 (en) * | 2015-04-30 | 2016-11-03 | Verint Systems Ltd. | System and method for spatial clustering using multiple-resolution grids |
US20170300566A1 (en) * | 2016-04-19 | 2017-10-19 | Strava, Inc. | Determining clusters of similar activities |
CN107547633A (en) * | 2017-07-27 | 2018-01-05 | 腾讯科技(深圳)有限公司 | Processing method, device and the storage medium of a kind of resident point of user |
CN109933583A (en) * | 2019-03-25 | 2019-06-25 | 山东浪潮云信息技术有限公司 | A kind of passenger flow statistics and objective group's portrait analysis method and system |
CN110019568A (en) * | 2019-04-12 | 2019-07-16 | 深圳市和讯华谷信息技术有限公司 | Site selecting method, device, computer equipment and storage medium based on space clustering |
-
2021
- 2021-04-12 CN CN202110388677.7A patent/CN112989143A/en active Pending
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103383682A (en) * | 2012-05-01 | 2013-11-06 | 刘龙 | Geographic coding method, and position inquiring system and method |
US20160321351A1 (en) * | 2015-04-30 | 2016-11-03 | Verint Systems Ltd. | System and method for spatial clustering using multiple-resolution grids |
US20170300566A1 (en) * | 2016-04-19 | 2017-10-19 | Strava, Inc. | Determining clusters of similar activities |
CN107547633A (en) * | 2017-07-27 | 2018-01-05 | 腾讯科技(深圳)有限公司 | Processing method, device and the storage medium of a kind of resident point of user |
CN109933583A (en) * | 2019-03-25 | 2019-06-25 | 山东浪潮云信息技术有限公司 | A kind of passenger flow statistics and objective group's portrait analysis method and system |
CN110019568A (en) * | 2019-04-12 | 2019-07-16 | 深圳市和讯华谷信息技术有限公司 | Site selecting method, device, computer equipment and storage medium based on space clustering |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Xu et al. | A hybrid machine learning model for demand prediction of edge-computing-based bike-sharing system using Internet of Things | |
WO2020176381A1 (en) | Joint order dispatching and fleet management for online ride-sharing platforms | |
US10034141B2 (en) | Systems and methods to identify home addresses of mobile devices | |
Pavlis et al. | A modified DBSCAN clustering method to estimate retail center extent | |
CN102521364B (en) | Method for inquiring shortest path between two points on map | |
CN103164529B (en) | A kind of anti-k nearest neighbor query method based on Voronoi diagram | |
Yao et al. | Evaluation and development of sustainable urban land use plans through spatial optimization | |
CN105160707A (en) | Three-dimensional model fast visualization method based on viewpoint indexes | |
CN105160173A (en) | Security assessment method and device | |
CN114548811A (en) | Airport accessibility detection method and device, electronic equipment and storage medium | |
CN114063995A (en) | Development engine system and method based on managed pipe city visualization | |
Zhuang et al. | Integrating a deep forest algorithm with vector‐based cellular automata for urban land change simulation | |
CN112989143A (en) | Guest group insights method based on geohash address coding | |
CN114723108B (en) | Method and device for calculating accessibility of mass public service facilities of urban road network | |
Reijsbergen | Probabilistic modelling of station locations in bicycle-sharing systems | |
Guo et al. | How to find a comfortable bus route-Towards personalized information recommendation services | |
WALDECK et al. | Integrated land-use and transportation modelling in developing countries: Using OpenTripPlanner to determine lowest-cost commute trips | |
Basnet et al. | Analysis of multifactorial social unrest events with spatio-temporal k-dimensional tree-based dbscan | |
Li et al. | Assessment of future urban growth impact on landscape pattern using cellular automata model: a case study of Xuzhou city, China | |
CN110428627B (en) | Bus trip potential area identification method and system | |
CN114511125A (en) | Space division method, device, equipment and medium | |
Tecim | Geographical information system based decision support system for tourism planning and development | |
Michalkó et al. | Mobility patterns of satellite travellers based on mobile phone cellular data | |
Alva et al. | Bottom-up approach for creating an urban digital twin platform and use cases | |
Attardi et al. | “Scrapping” of Quarters and Urban Renewal: A Geostatistic-Based Evaluation |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
WD01 | Invention patent application deemed withdrawn after publication |
Application publication date: 20210618 |
|
WD01 | Invention patent application deemed withdrawn after publication |