WO2021103626A1 - Method and apparatus for generating user geographical portrait, computer device, and storage medium - Google Patents

Method and apparatus for generating user geographical portrait, computer device, and storage medium Download PDF

Info

Publication number
WO2021103626A1
WO2021103626A1 PCT/CN2020/105506 CN2020105506W WO2021103626A1 WO 2021103626 A1 WO2021103626 A1 WO 2021103626A1 CN 2020105506 W CN2020105506 W CN 2020105506W WO 2021103626 A1 WO2021103626 A1 WO 2021103626A1
Authority
WO
WIPO (PCT)
Prior art keywords
cluster
service data
data
geographic
portrait
Prior art date
Application number
PCT/CN2020/105506
Other languages
French (fr)
Chinese (zh)
Inventor
曹煬
Original Assignee
深圳壹账通智能科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 深圳壹账通智能科技有限公司 filed Critical 深圳壹账通智能科技有限公司
Publication of WO2021103626A1 publication Critical patent/WO2021103626A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/02Marketing; Price estimation or determination; Fundraising
    • G06Q30/0201Market modelling; Market analysis; Collecting market data
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/29Geographical information databases

Definitions

  • This application relates to the field of computer technology, and in particular to a method, device, computer equipment and storage medium for generating geographic portraits of users.
  • LBS Location-Based Services
  • shopping applications LBS is used to obtain the user's location, which not only saves the user the tedious process of manually inputting location information, but also provides a basis for geographic location information for the selection of distribution warehouses;
  • navigation applications LBS obtains the user's location information in real time and returns it to the user, making the acquisition and query of road condition information more intuitive and simple;
  • various mobile applications such as social networking, weather, taxi, group buying, and travel , LBS plays an important role.
  • the geographic location information provided by LBS can enrich application functions and greatly facilitate users’ lives.
  • a method, device, computer device, and storage medium for generating a geographic portrait of a user are provided.
  • a method for generating geographic portraits of users includes:
  • the location service data is subjected to density clustering processing to obtain the data cluster of the location service data;
  • a device for generating geographic portraits of users includes:
  • User data acquisition module for acquiring location service data of business users
  • the data cluster obtaining module is used to perform density clustering processing on the location service data through a density-based clustering algorithm to obtain a data cluster of the location service data;
  • the reference position cluster determination module is used to determine the reference position cluster to which the reference position of the geographic portrait of the business user belongs from the data cluster; wherein the reference position of the geographic portrait includes the reference position when generating the user's geographic portrait;
  • the cluster center determination module is used to perform clustering processing on the reference position cluster to obtain the cluster center of the reference position cluster
  • the geographic portrait production module is used to generate user geographic portraits of business users based on cluster center and positioning service data.
  • a computer device including a memory and one or more processors, the memory stores computer-readable instructions, and when the computer-readable instructions are executed by the processor, the one or more processors execute The following steps:
  • the location service data is subjected to density clustering processing to obtain the data cluster of the location service data;
  • One or more computer-readable storage media storing computer-readable instructions.
  • the one or more processors perform the following steps:
  • the location service data is subjected to density clustering processing to obtain the data cluster of the location service data;
  • the above-mentioned user geographic portrait generation method, device, computer equipment, and storage medium perform density clustering processing on the location service data through a density-based clustering algorithm, and then determine from the obtained data clusters to which the reference location of the geographic portrait of the business user belongs
  • the reference location cluster effectively utilizes the characteristics of the density distribution of the location service data to ensure the accuracy of the reference location cluster; then based on the cluster center and location service data obtained by clustering the reference location cluster, the user geographic portrait of the business user is generated, Improve the accuracy of the user's geographic portrait.
  • Fig. 1 is an application scenario diagram of a method for generating a geographic portrait of a user according to one or more embodiments
  • FIG. 2 is a schematic flowchart of a method for generating a geographic portrait of a user according to one or more embodiments
  • FIG. 3 is a schematic diagram of a flow of data cluster acquisition according to one or more embodiments
  • Fig. 4 is a block diagram of an apparatus for generating a geographic portrait of a user according to one or more embodiments
  • Figure 5 is a block diagram of a computer device according to one or more embodiments.
  • the user geographic portrait generation method provided in this application can be applied to the application environment as shown in FIG. 1.
  • the terminal 102 communicates with the server 104 through the network through the network.
  • the terminal 102 sends the location service data of the business user to the server 104, and the server 104 performs density clustering processing on the obtained location service data through a density-based clustering algorithm, and then determines the geographic portrait reference of the business user from the obtained data cluster
  • the reference location cluster to which the location belongs is based on the cluster center and location service data obtained by clustering the reference location cluster to generate the user geographic portrait of the business user.
  • the location service data of the business user can be stored in the local cache of the server 104, and the server 104 can directly obtain the location service data of the business user from the local cache for subsequent user geographic portrait generation processing; it can also be directly used by the terminal 102 Perform user geographic portrait generation processing on the location service data of business users.
  • the terminal 102 may be, but is not limited to, various personal computers, notebook computers, smart phones, tablet computers, and portable wearable devices.
  • the server 104 may be implemented by an independent server or a server cluster composed of multiple servers.
  • a method for generating geographic portraits of users is provided. Taking the method applied to the server or terminal in FIG. 1 as an example, the method includes the following steps:
  • Step S201 Obtain location service data of the business user.
  • location service data that is, LBS data
  • LBS data is generated when the user terminal uses location services.
  • the terminal application needs to perform positioning and navigation, it passes through the radio communication network of the telecommunications mobile operator, such as the GSM network (Global System for Mobile Communications, Global Mobile Communication System), CDMA network (Code Division Multiple Access), LTE network (Long Term Evolution) or 5G (5th-Generation, the fifth-generation mobile communication technology), or through external positioning methods ,
  • the location information of the mobile terminal obtained by GPS (Global Positioning System, Global Positioning System).
  • GPS Global Positioning System, Global Positioning System
  • Step S203 Perform density clustering processing on the location service data through the density-based clustering algorithm to obtain a data cluster of the location service data.
  • the density-based clustering algorithm is based on the density distribution of the data to perform clustering, such as DBSCAN (Density-Based Spatial Clustering of Applications with Noise, density-based spatial clustering with noise) algorithm, OPTICS (Ordering points to) identify the clustering structure, object sorting and identifying clustering structure) algorithms, etc.
  • a density-based clustering algorithm is used to perform density clustering processing on the location service data, and the location service data is clustered into clusters to obtain data clusters of the location service data.
  • the data cluster is a cluster of data points of the same type obtained by clustering the location service data after processing the location service data through a density-based clustering algorithm.
  • Step S205 Determine the reference position cluster to which the reference position of the geographic portrait of the business user belongs from the data cluster; wherein the reference position of the geographic portrait includes the reference position when generating the geographic portrait of the user.
  • the geographic portrait reference location is the reference location data referenced for generating the user's geographic portrait, which can specifically include the reference location when generating the user's geographic portrait, such as the reference location data that needs to be referred to when determining the label of the working city in the user's geographic portrait. It is the user’s work address, and for the commute distance tag, the reference location data that needs to be referred to include the user’s work address and home address.
  • the geographic portrait reference location is set according to the actual needs of the user’s geographic portrait, such as business users Home address, work address, etc.
  • the reference location cluster is the geographic portrait reference location, that is, the data cluster to which the reference location cluster when generating the user's geographic portrait is performed, that is, the reference location cluster is the data cluster where the geographic portrait reference location of the business user is located.
  • the reference position cluster to which the reference position of the geographic portrait belongs it can be determined according to the statistical result of each data point in the data cluster. For example, when the reference location of the geographic portrait includes the home address, the data cluster where the home address of the business user is located can be determined according to the day/night ratio of each data point in the data cluster, so as to determine the reference location cluster from the data cluster.
  • Step S207 Perform clustering processing on the reference position cluster to obtain the cluster center of the reference position cluster.
  • the clustering process obtains the cluster center of the reference location cluster.
  • the cluster center is the actual positioning coordinate data of the geographic portrait reference location of the business user, that is, the cluster center of the reference location cluster corresponds to the geographic portrait reference location of the business user.
  • the cluster center can be updated with the interest point closest to the cluster center, and the original cluster center can be replaced based on the updated cluster center.
  • Step S209 Generate a user geographic portrait of the business user based on the cluster center and the location service data.
  • the user geographic portrait of the business user is generated based on the cluster center and the positioning service data.
  • the user geographic portrait reflects the personal characteristics of the business user, which can be specifically composed of geographic tags of multiple business users.
  • the geographic tags may include, but are not limited to, home location, work location, commuting distance, work city, city of residence, Whether to work across places and hometown, etc.
  • the geographic portrait reference position of the business user is determined
  • the reference location clusters effectively utilize the density distribution characteristics of the location service data to ensure the accuracy of the reference location clusters; then cluster centers and location service data obtained by clustering the reference location clusters to generate user geographic portraits of business users , Improve the accuracy of the user's geographic portrait.
  • the process of obtaining data clusters that is, performing density clustering processing on location service data through a density-based clustering algorithm, to obtain data clusters of location service data includes:
  • Step S301 Obtain a preset core point coverage radius and a core point coverage number threshold.
  • the location service data is subjected to density clustering processing through the DBSCAN algorithm to obtain a data cluster of the location service data.
  • the preset core point coverage radius and the core point coverage number threshold are acquired, and the core point coverage radius and the core point coverage number threshold are flexibly set according to actual clustering requirements.
  • the core point coverage radius is the coverage area of the core point during the clustering process
  • the core point coverage threshold is the number of LBS data points that the core point covers the least.
  • the core point is defined as the distance from an LBS data point is less than the core point.
  • the number of other LBS data points of the point coverage radius exceeds the core point coverage threshold. The larger the core point coverage radius, and the smaller the core point coverage number threshold, the greater the number of core points in the positioning service data.
  • the core point coverage radius is set to 500 meters, and the core point coverage number threshold is set to 10. That is, in the positioning service data, there are more than 10 LBS data points within 500 meters of the LBS data point. Defined as the core point.
  • Step S303 According to the core point coverage radius and the core point coverage number threshold, the positioning service data is clustered and iteratively processed through the DBSCAN algorithm to obtain the core point of the positioning service data.
  • the DBSCAN algorithm After determining the core point coverage radius and the core point coverage number threshold, the DBSCAN algorithm is used to cluster and iteratively process all the positioning service data, and the core points that meet the core point coverage radius and core point coverage number threshold are determined from the positioning service data.
  • the DBSCAN algorithm is a density-based spatial clustering algorithm, which divides areas with sufficient density into clusters, and finds clusters of arbitrary shapes in a noisy spatial database. It defines clusters as the density of connected points. The largest collection.
  • Step S305 Perform clustering iterative processing on each core point to obtain a data cluster of the positioning service data composed of the core points.
  • the clustering and iterative processing is further performed on each core point.
  • the DBSCAN algorithm can be used for clustering iterative processing to obtain a data cluster of the positioning service data composed of the core points.
  • the data cluster is composed of the connection of core points, and the LBS data points covered by the data cluster can be considered as the same kind of data.
  • performing clustering iterative processing on each core point to obtain a data cluster of positioning service data composed of core points includes: obtaining a preset core point combination distance threshold; and according to the core point combination distance threshold, passing The DBSCAN algorithm performs clustering iterative processing on each core point, and obtains a data cluster of positioning service data composed of core points.
  • the clustering and iterative processing of each core point is continued through the DBSCAN algorithm, and a data cluster of positioning service data composed of the core points is obtained.
  • the preset core point combination distance threshold is obtained.
  • the core point combination distance threshold is set according to the size requirements of the data cluster.
  • the core point combination distance threshold is whether the core points are connected to form the data cluster.
  • the two core points are combined to obtain a data cluster. The greater the core point combination distance threshold, the more core points connected to the obtained data cluster, and the more LBS data points it covers.
  • clustering and iterative processing is performed on each core point through the DBSCAN algorithm to obtain a data cluster of positioning service data composed of core points.
  • the distance between at least one other core point and the core point is less than the core point combined distance threshold.
  • the core point combination distance threshold is 500 meters, that is, for a core point, if there are other core points within 500 meters around it, the core point is connected with other core points to form a data cluster .
  • the reference location of the geographic portrait includes home address and work address; from the data cluster, determining the reference location cluster to which the reference location of the geographic portrait of the business user belongs includes: determining the number of location service data in the data cluster and the location service Time distribution of data; determine the home address cluster to which the home address belongs and the working address cluster to which the work address belongs according to the number and time distribution of location service data in the data cluster; and obtain the reference location cluster according to the home address cluster and the working address cluster.
  • the reference location of the geographic portrait includes the home address and work address, that is, the user geographic portrait analysis is generated based on the home address and work address of the business user, specifically according to the number of LBS data in the data cluster and the time period distribution, from the data cluster Determine the reference position cluster.
  • the reference location cluster to which the reference location of the geographic portrait of the business user belongs when determining the reference location cluster to which the reference location of the geographic portrait of the business user belongs, statistical analysis is performed on the location service data covered by the data cluster, and the number of location service data and the time period distribution of the location service data are determined.
  • the time period distribution can be, but is not limited to, day/night, working day/non-working day, etc.
  • the proportion of the location service data in the data cluster can be analyzed in different time periods, such as day and night.
  • the reference location cluster can be composed of the home address cluster and the work address cluster.
  • n ⁇ 2 calculate the average avg based on the total number of points/n of the location service data. For any data cluster, if the total number of data cluster points is greater than or equal to avg, the proportion of daytime points is calculated according to the number of daytime points/the total number of cluster points, and the proportion of night points is calculated according to the number of nights/total cluster points; if the total number of data cluster points is less than avg, according to ( Daytime points/avg)*(daytime points/total cluster points) calculate the proportion of daytime points, and calculate the proportion of night points according to (night points/avg)*(night points/total cluster points).
  • the cluster with the highest percentage during the day is the cluster where the work address is located, and the cluster where the home address is at night is the cluster with the highest percentage during the day. If the highest proportion of points in the daytime or the highest proportion of points in the evening is in the same cluster, and n clusters have only one time period (all recorded as daytime), the time period of the cluster with the highest proportion is selected as the cluster or the home address The cluster where the work address is located (only one cluster is formed at the end, that is, the cluster in the daytime).
  • n clusters have only one time period (recorded as daytime), and the daytime proportion of another cluster>this n cluster, then choose the daytime among the n clusters
  • the cluster with the highest proportion is regarded as the cluster during the day, and the other cluster becomes the cluster at night, so as to determine the home address cluster and the work address cluster.
  • it further includes: when the number of data clusters is 0, generating a user geographic portrait of the business user based on the positioning service data.
  • no data clusters are obtained, that is, when the number of data clusters is 0, statistical analysis is performed directly based on the location service data, and the location service data. For example, you can find the city where each LBS data point is located, the number of statistical days + the number of points (first comparison days) is the city where you are, and finally the geographic tag of the business user is determined, such as the city where the business user is located, the New Year city, and the list of cities visited And so on, further generate the user geographic portrait of the business user based on the geographic tag.
  • the user geographic portrait of the business user includes: home location, work location, commuting distance, working city, residential city, whether to work across regions, hometown, whether to work outside, cities frequented on holidays, and whether on weekends At least one of the house, whether there is a house, and the nature of work.
  • the home location and work location can be determined based on the cluster center of the home address cluster and the work address; the commuting distance can be calculated based on the distance between the home location and the location of the work unit; the working city can be determined based on the location of the work unit; the city of residence It can be determined according to the location of the family; whether to work across places can be determined according to the correspondence between the city of work and the city of residence; the hometown can be determined according to the distribution of LBS data during the Spring Festival; whether migrant workers can be determined according to the correspondence between the hometown and the city of work; Cities frequented on holidays can be determined according to the distribution of LBS data of cities frequently visited on holidays; whether a weekend house can be determined according to the distribution of LBS data on weekends.
  • the LBS data on weekends exceeds a certain range of the home location, if it exceeds a certain distance, it is considered this If you go out for a day, whether you are staying at home or not, if the number of days at home exceeds the number of days you are away, it is considered a weekend home; whether there is a room can be determined according to the changes in the location of the family within a certain period of time, such as three years; the nature of work can include travel, overtime, Night shifts, etc., can be specific when the number of non-working cities exceeds a certain value at the beginning of work. For example, when the number of working days exceeds 20% of the total working days, it is considered to be the nature of travel work.
  • the number of LBS data points in the place exceeds a certain value. If it accounts for 30% of the total number of points, the work is considered to be overtime; if the number of LBS data points at the work place exceeds a certain value from 12pm to 7am, if it accounts for the total 50% of the credit is considered to be the nature of night shift work.
  • a full user geographic portrait of the corresponding business users can be obtained, so as to ensure the corresponding provision of high-quality business services.
  • the positioning service data of the business user after obtaining the positioning service data of the business user, it further includes: extracting the out-of-area coordinates from the positioning service data; when it is determined that the out-of-area coordinates are inverted coordinates, performing latitude and longitude replacement processing on the out-of-area coordinates to obtain Replace the replacement coordinates after replacement processing; and add the replacement coordinates to the location service data, and use the updated location service data as the location service data.
  • the latitude and longitude inverted coordinates in the acquired positioning service data are replaced by the latitude and longitude replacement process, and the latitude and longitude are replaced to obtain the replacement coordinates, thereby correcting the data with the latitude and longitude inversion error to a certain extent to ensure This improves the accuracy of location service data, thereby improving the accuracy of user geographic portraits.
  • the out-of-area coordinates are positioning service data in the range of the data area of interest, and the range of the data area of interest is determined according to the data mining requirements for the LBS data. For example, for data mining that is only applicable to specific locations, such as an application scenario that only performs data mining on LBS data in China, the data area of interest is within China, and LBS data outside of China is excluded.
  • the out-of-area coordinates may include latitude and longitude coordinate information.
  • the coordinates outside the area are inverted coordinates, for example, based on the LBS data of the coordinates outside the area, it can be judged whether the coordinates outside the area are the inverted coordinates where the latitude and longitude are reversed. If so, perform the latitude and longitude replacement processing on the coordinates outside the area to obtain the replacement processing. If it is judged that the coordinates outside the area are not inverted coordinates, it means that the coordinates outside the area are real coordinates outside the area, and they are not the data of interest for data mining, so they are excluded. After performing replacement processing on the positioning service data with the latitude and longitude reversed, the obtained replacement coordinates are added to the positioning service data to obtain the updated positioning service data, thereby correcting the inverted error data.
  • a device for generating geographic portraits of users including: a user data acquisition module 401, a data cluster acquisition module 403, a reference position cluster determination module 405, a cluster center determination module 407, and Geographical portrait production module 409, of which:
  • the user data obtaining module 401 is used to obtain location service data of business users
  • the data cluster obtaining module 403 is used to perform density clustering processing on the location service data through a density-based clustering algorithm to obtain a data cluster of the location service data;
  • the reference position cluster determining module 405 is used to determine the reference position cluster to which the reference position of the geographic portrait of the business user belongs from the data cluster; wherein, the reference position of the geographic portrait includes the reference position when generating the geographic portrait of the user;
  • the cluster center determining module 407 is used to perform clustering processing on the reference position cluster to obtain the cluster center of the reference position cluster;
  • the geographic portrait production module 409 is used to generate user geographic portraits of business users based on the cluster center and positioning service data.
  • the data cluster obtaining module 403 includes a core point condition unit, a core point determination unit, and a data cluster determination unit; among them: the core point condition unit is used to obtain a preset core point coverage radius and core point coverage number Threshold; core point determination unit, used to cluster and iteratively process the positioning service data through the DBSCAN algorithm according to the core point coverage radius and the core point coverage number threshold to obtain the core point of the positioning service data; and the data cluster determination unit for Perform clustering iterative processing on each core point to obtain a data cluster of positioning service data composed of core points.
  • the core point condition unit is used to obtain a preset core point coverage radius and core point coverage number Threshold
  • core point determination unit used to cluster and iteratively process the positioning service data through the DBSCAN algorithm according to the core point coverage radius and the core point coverage number threshold to obtain the core point of the positioning service data
  • the data cluster determination unit for Perform clustering iterative processing on each core point to obtain a data cluster of positioning service data composed of core points.
  • the data cluster determination unit includes a combination threshold subunit and a core point combination subunit; wherein: the combination threshold subunit is used to obtain a preset core point combination distance threshold; and the core point combination subunit is used
  • the DBSCAN algorithm is used to cluster and iteratively process each core point to obtain a data cluster of positioning service data composed of core points.
  • the reference location of the geographic portrait includes a home address and a work address;
  • the reference location cluster determination module 405 includes a data cluster analysis unit, a home work address cluster subunit, and a reference location cluster subunit; among them: a data cluster analysis unit, Used to determine the number of location service data in the data cluster and the time distribution of the location service data;
  • the home work address cluster subunit is used to determine the home address cluster and the home address cluster to which the home address belongs according to the number and time distribution of the location service data in the data cluster The working address cluster to which the working address belongs; and the reference location cluster subunit, which is used to obtain the reference location cluster according to the home address cluster and the working address cluster.
  • a clusterless processing module is further included, which is used to generate a user geographic portrait of the business user based on the positioning service data when the number of data clusters is zero.
  • the user geographic portrait of the business user includes: home location, work location, commuting distance, working city, residential city, whether to work across regions, hometown, whether to work outside, cities frequented on holidays, and whether on weekends At least one of the house, whether there is a house, and the nature of work.
  • it further includes an out-of-area coordinate module, a replacement processing module, and a data update module; wherein: the out-of-area coordinate module is used to extract the out-of-area coordinates from the positioning service data; the replacement processing module is used to determine the area When the external coordinates are inverted coordinates, perform the latitude and longitude replacement processing on the coordinates outside the area to obtain the replacement coordinates after the replacement processing; and a data update module for adding the replacement coordinates to the positioning service data, and use the updated positioning service data as the positioning Service data.
  • the out-of-area coordinate module is used to extract the out-of-area coordinates from the positioning service data
  • the replacement processing module is used to determine the area When the external coordinates are inverted coordinates, perform the latitude and longitude replacement processing on the coordinates outside the area to obtain the replacement coordinates after the replacement processing
  • a data update module for adding the replacement coordinates to the positioning service data, and use the updated positioning service data as the positioning Service data.
  • Each module in the above-mentioned user geographic portrait generating device can be implemented in whole or in part by software, hardware, and a combination thereof.
  • the above-mentioned modules may be embedded in the form of hardware or independent of the processor in the computer equipment, or may be stored in the memory of the computer equipment in the form of software, so that the processor can call and execute the operations corresponding to the above-mentioned modules.
  • a computer device is provided.
  • the computer device may be a server or a terminal, and its internal structure diagram may be as shown in FIG. 5.
  • the computer equipment includes a processor, a memory, a network interface, and a database connected through a system bus. Among them, the processor of the computer device is used to provide calculation and control capabilities.
  • the memory of the computer device includes a non-volatile or volatile storage medium and internal memory.
  • the non-volatile or volatile storage medium stores an operating system, computer readable instructions, and a database.
  • the internal memory provides an environment for the operation of the operating system and computer-readable instructions in the non-volatile or volatile storage medium.
  • the database of the computer equipment is used to store data.
  • the network interface of the computer device is used to communicate with an external terminal through a network connection.
  • FIG. 5 is only a block diagram of part of the structure related to the solution of the present application, and does not constitute a limitation on the computer device to which the solution of the present application is applied.
  • the specific computer device may Including more or fewer parts than shown in the figure, or combining some parts, or having a different arrangement of parts.
  • a computer device includes a memory and one or more processors.
  • the memory stores computer-readable instructions.
  • the one or more processors execute the following steps:
  • the location service data is subjected to density clustering processing to obtain the data cluster of the location service data;
  • the processor further implements the following steps when executing the computer-readable instructions: obtaining the preset core point coverage radius and core point coverage number threshold; according to the core point coverage radius and core point coverage number threshold, the DBSCAN algorithm is used Perform clustering iterative processing on the positioning service data to obtain the core points of the positioning service data; and perform clustering iterative processing on each core point to obtain a data cluster of the positioning service data composed of the core points.
  • the processor further implements the following steps when executing the computer-readable instructions: obtaining a preset core point combination distance threshold; and according to the core point combination distance threshold, perform clustering iterative processing on each core point through the DBSCAN algorithm , Get the data cluster of location service data composed of core points.
  • the reference location of the geographic portrait includes the home address and the work address; the processor also implements the following steps when executing the computer-readable instructions: determining the number of positioning service data in the data cluster and the time distribution of the positioning service data; according to the data The number and time distribution of the location service data in the cluster determine the home address cluster to which the home address belongs and the working address cluster to which the work address belongs; and the reference location cluster is obtained according to the home address cluster and the working address cluster.
  • the processor further implements the following steps when executing the computer-readable instructions: when the number of data clusters is 0, generate a user geographic portrait of the business user based on the location service data.
  • the user geographic portrait of the business user includes: home location, workplace location, commuting distance, working city, residential city, whether to work across regions, hometown, whether to work outside, cities frequently visited on holidays, and whether on weekends At least one of the house, whether there is a house, and the nature of work.
  • the processor further implements the following steps when executing the computer-readable instructions: extracting out-of-area coordinates from the positioning service data; when determining that the out-of-area coordinates are inverted coordinates, perform latitude and longitude replacement processing on the out-of-area coordinates to obtain Replace the replacement coordinates after replacement processing; and add the replacement coordinates to the location service data, and use the updated location service data as the location service data.
  • One or more computer-readable non-volatile storage media storing computer-readable instructions.
  • the one or more processors execute the following steps:
  • the location service data is subjected to density clustering processing to obtain the data cluster of the location service data;
  • the computer-readable storage medium may be non-volatile or volatile.
  • the following steps are also implemented: obtaining preset core point coverage radius and core point coverage number threshold; according to the core point coverage radius and core point coverage threshold value, pass DBSCAN The algorithm performs clustering iterative processing on the positioning service data to obtain the core points of the positioning service data; and performs clustering iterative processing on each core point to obtain a data cluster of the positioning service data composed of core points.
  • the following steps are also implemented: obtaining a preset core point combination distance threshold; and according to the core point combination distance threshold, clustering iterations of each core point through the DBSCAN algorithm After processing, a data cluster of positioning service data composed of core points is obtained.
  • the reference location of the geographic portrait includes the home address and the work address; when the computer-readable instructions are executed by the processor, the following steps are also implemented: determining the number of location service data in the data cluster and the time period distribution of the location service data; The number and time distribution of the location service data in the data cluster determine the home address cluster to which the home address belongs and the working address cluster to which the work address belongs; and the reference location cluster is obtained according to the home address cluster and the working address cluster.
  • the following steps are further implemented: when the number of data clusters is zero, a user geographic portrait of the business user is generated based on the location service data.
  • the user geographic portrait of the business user includes: home location, work location, commuting distance, working city, residential city, whether to work across regions, hometown, whether to work outside, cities frequented on holidays, and whether on weekends At least one of the house, whether there is a house, and the nature of work.
  • the following steps are also implemented: extracting out-of-area coordinates from the positioning service data; when it is determined that the out-of-area coordinates are inverted coordinates, performing latitude and longitude replacement processing on the out-of-area coordinates, Obtain the replacement coordinates after replacement processing; and add the replacement coordinates to the location service data, and use the updated location service data as the location service data.
  • Non-volatile memory may include read only memory (ROM), programmable ROM (PROM), electrically programmable ROM (EPROM), electrically erasable programmable ROM (EEPROM), or flash memory.
  • Volatile memory may include random access memory (RAM) or external cache memory.
  • RAM is available in many forms, such as static RAM (SRAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), double data rate SDRAM (DDRSDRAM), enhanced SDRAM (ESDRAM), synchronous chain Channel (Synchlink) DRAM (SLDRAM), memory bus (Rambus) direct RAM (RDRAM), direct memory bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM), etc.

Landscapes

  • Engineering & Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • Accounting & Taxation (AREA)
  • Development Economics (AREA)
  • Finance (AREA)
  • Strategic Management (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • Game Theory and Decision Science (AREA)
  • General Business, Economics & Management (AREA)
  • Marketing (AREA)
  • Economics (AREA)
  • Remote Sensing (AREA)
  • General Engineering & Computer Science (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

A method for generating a user geographical portrait, relating to the technical field of big data analysis, and comprising: acquiring location service data of a service user (S201); performing density clustering processing on the location service data by means of a density-based clustering algorithm, so as to obtain a data cluster of the location service data (S203); determining, from the data cluster, a reference position cluster that geographical portrait reference positions of the service user belong to, wherein the geographical portrait reference positions comprise a reference position used when generating a user geographical portrait (S205); performing clustering processing on the reference position cluster, so as to obtain a cluster center of the reference position cluster (S207); and generating a user geographical portrait of the service user on the basis of the cluster center and the location service data (S209).

Description

用户地理画像生成方法、装置、计算机设备和存储介质Method, device, computer equipment and storage medium for generating geographic portrait of user
相关申请的交叉引用Cross-references to related applications
本申请要求于2019年11月26日提交中国专利局,申请号为2019111734073,申请名称为“用户地理画像生成方法、装置、计算机设备和存储介质”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。This application claims the priority of a Chinese patent application filed with the Chinese Patent Office on November 26, 2019. The application number is 2019111734073 and the application name is "User Geographic Portrait Generation Method, Device, Computer Equipment, and Storage Medium". The entire content of the application is approved. The reference is incorporated in this application.
技术领域Technical field
本申请涉及计算机技术领域,特别是涉及一种用户地理画像生成方法、装置、计算机设备和存储介质。This application relates to the field of computer technology, and in particular to a method, device, computer equipment and storage medium for generating geographic portraits of users.
背景技术Background technique
定位服务(Location Based Services,LBS)是当前移动终端服务中的热点。在当前移动应用市场中,LBS的应用非常广泛,购物应用中,利用LBS获取用户位置,既省去了用户手动输入位置信息的繁琐过程,又为配送仓库的选择提供了地理位置信息的依据;在导航应用中,LBS实时获取用户的位置信息并返回给用户,使路况信息的获取和查询更加直观和简单;除此之外,在社交、天气、打车、团购、旅游等各种移动应用中,LBS都扮演着重要的角色,LBS所提供的地理位置信息能够丰富应用的功能,并极大方便了用户的生活。Location-Based Services (LBS) is currently a hot spot in mobile terminal services. In the current mobile application market, LBS is widely used. In shopping applications, LBS is used to obtain the user's location, which not only saves the user the tedious process of manually inputting location information, but also provides a basis for geographic location information for the selection of distribution warehouses; In navigation applications, LBS obtains the user's location information in real time and returns it to the user, making the acquisition and query of road condition information more intuitive and simple; in addition, in various mobile applications such as social networking, weather, taxi, group buying, and travel , LBS plays an important role. The geographic location information provided by LBS can enrich application functions and greatly facilitate users’ lives.
然而,发明人意识到,目前在对定位服务数据进行数据挖掘后,一般会通过各种标签为各业务用户进行画像,再基于用户画像提供对应服务,如何准确生成业务用户的画像成为提供对应高质量服务的重要基础。However, the inventor realized that after data mining on location service data, various business users are generally profiled through various tags, and then corresponding services are provided based on the user profile. How to accurately generate business user profiles becomes a high-level solution. An important basis for quality service.
发明内容Summary of the invention
根据本申请公开的各种实施例,提供一种用户地理画像生成方法、装置、计算机设备和存储介质。According to various embodiments disclosed in the present application, a method, device, computer device, and storage medium for generating a geographic portrait of a user are provided.
一种用户地理画像生成方法包括:A method for generating geographic portraits of users includes:
获取业务用户的定位服务数据;Obtain location service data of business users;
通过基于密度的聚类算法,对定位服务数据进行密度聚类处理,得到定位服务数据的数据簇;Through density-based clustering algorithm, the location service data is subjected to density clustering processing to obtain the data cluster of the location service data;
从数据簇中,确定业务用户的地理画像基准位置所属的基准位置簇;其中,地理画像基准位置包括进行用户地理画像生成时的参考位置;From the data cluster, determine the reference position cluster to which the reference position of the geographic portrait of the business user belongs; where the reference position of the geographic portrait includes the reference position when generating the user's geographic portrait;
对基准位置簇进行聚类处理,得到基准位置簇的簇中心;及Perform clustering processing on the reference position cluster to obtain the cluster center of the reference position cluster; and
基于簇中心和定位服务数据,生成业务用户的用户地理画像。Based on the cluster center and location service data, generate user geographic portraits of business users.
一种用户地理画像生成装置包括:A device for generating geographic portraits of users includes:
用户数据获取模块,用于获取业务用户的定位服务数据;User data acquisition module for acquiring location service data of business users;
数据簇获得模块,用于通过基于密度的聚类算法,对定位服务数据进行密度聚类处理,得到定位服务数据的数据簇;The data cluster obtaining module is used to perform density clustering processing on the location service data through a density-based clustering algorithm to obtain a data cluster of the location service data;
基准位置簇确定模块,用于从数据簇中,确定业务用户的地理画像基准位置所属的基准位置簇;其中,地理画像基准位置包括进行用户地理画像生成时的参考位置;The reference position cluster determination module is used to determine the reference position cluster to which the reference position of the geographic portrait of the business user belongs from the data cluster; wherein the reference position of the geographic portrait includes the reference position when generating the user's geographic portrait;
簇中心确定模块,用于对基准位置簇进行聚类处理,得到基准位置簇的簇中心;及The cluster center determination module is used to perform clustering processing on the reference position cluster to obtain the cluster center of the reference position cluster; and
地理画像生产模块,用于基于簇中心和定位服务数据,生成业务用户的用户地理画像。The geographic portrait production module is used to generate user geographic portraits of business users based on cluster center and positioning service data.
一种计算机设备,包括存储器和一个或多个处理器,所述存储器中存储有计算机可读指令,所述计算机可读指令被所述处理器执行时,使得所述一个或多个处理器执行以下步骤:A computer device, including a memory and one or more processors, the memory stores computer-readable instructions, and when the computer-readable instructions are executed by the processor, the one or more processors execute The following steps:
获取业务用户的定位服务数据;Obtain location service data of business users;
通过基于密度的聚类算法,对定位服务数据进行密度聚类处理,得到定位服务数据的数据簇;Through density-based clustering algorithm, the location service data is subjected to density clustering processing to obtain the data cluster of the location service data;
从数据簇中,确定业务用户的地理画像基准位置所属的基准位置簇;其中,地理画像基准位置包括进行用户地理画像生成时的参考位置;From the data cluster, determine the reference position cluster to which the reference position of the geographic portrait of the business user belongs; where the reference position of the geographic portrait includes the reference position when generating the user's geographic portrait;
对基准位置簇进行聚类处理,得到基准位置簇的簇中心;及Perform clustering processing on the reference position cluster to obtain the cluster center of the reference position cluster; and
基于簇中心和定位服务数据,生成业务用户的用户地理画像。Based on the cluster center and location service data, generate user geographic portraits of business users.
一个或多个存储有计算机可读指令的计算机可读存储介质,计算机可读指令被一个或多个处理器执行时,使得一个或多个处理器执行以下步骤:One or more computer-readable storage media storing computer-readable instructions. When the computer-readable instructions are executed by one or more processors, the one or more processors perform the following steps:
获取业务用户的定位服务数据;Obtain location service data of business users;
通过基于密度的聚类算法,对定位服务数据进行密度聚类处理,得到定位服务数据的数据簇;Through density-based clustering algorithm, the location service data is subjected to density clustering processing to obtain the data cluster of the location service data;
从数据簇中,确定业务用户的地理画像基准位置所属的基准位置簇;其中,地理画像基准位置包括进行用户地理画像生成时的参考位置;From the data cluster, determine the reference position cluster to which the reference position of the geographic portrait of the business user belongs; where the reference position of the geographic portrait includes the reference position when generating the user's geographic portrait;
对基准位置簇进行聚类处理,得到基准位置簇的簇中心;及Perform clustering processing on the reference position cluster to obtain the cluster center of the reference position cluster; and
基于簇中心和定位服务数据,生成业务用户的用户地理画像。Based on the cluster center and location service data, generate user geographic portraits of business users.
上述用户地理画像生成方法、装置、计算机设备和存储介质,通过基于密度的聚类算法对定位服务数据进行密度聚类处理,再从得到的数据簇中,确定业务用户的地理画像基准位置所属的基准位置簇,有效利用了定位服务数据的密度分布特点,确保了基准位置簇的准确性;再基于基准位置簇进行聚类处理得到的簇中心和定位服务数据,生成业务用户的用户地理画像,提高了用户地理画像的准确性。The above-mentioned user geographic portrait generation method, device, computer equipment, and storage medium perform density clustering processing on the location service data through a density-based clustering algorithm, and then determine from the obtained data clusters to which the reference location of the geographic portrait of the business user belongs The reference location cluster effectively utilizes the characteristics of the density distribution of the location service data to ensure the accuracy of the reference location cluster; then based on the cluster center and location service data obtained by clustering the reference location cluster, the user geographic portrait of the business user is generated, Improve the accuracy of the user's geographic portrait.
本申请的一个或多个实施例的细节在下面的附图和描述中提出。本申请的其它特征和优点将从说明书、附图以及权利要求书变得明显。The details of one or more embodiments of the present application are set forth in the following drawings and description. Other features and advantages of this application will become apparent from the description, drawings and claims.
附图说明Description of the drawings
为了更清楚地说明本申请实施例中的技术方案,下面将对实施例中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本申请的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其它的附图。In order to more clearly describe the technical solutions in the embodiments of the present application, the following will briefly introduce the drawings that need to be used in the embodiments. Obviously, the drawings in the following description are only some embodiments of the present application. A person of ordinary skill in the art can obtain other drawings based on these drawings without creative work.
图1为根据一个或多个实施例中用户地理画像生成方法的应用场景图;Fig. 1 is an application scenario diagram of a method for generating a geographic portrait of a user according to one or more embodiments;
图2为根据一个或多个实施例中用户地理画像生成方法的流程示意图;2 is a schematic flowchart of a method for generating a geographic portrait of a user according to one or more embodiments;
图3为根据一个或多个实施例中数据簇获取的流程示意图;FIG. 3 is a schematic diagram of a flow of data cluster acquisition according to one or more embodiments;
图4为根据一个或多个实施例中用户地理画像生成装置的框图;Fig. 4 is a block diagram of an apparatus for generating a geographic portrait of a user according to one or more embodiments;
图5为根据一个或多个实施例中计算机设备的框图。Figure 5 is a block diagram of a computer device according to one or more embodiments.
具体实施方式Detailed ways
为了使本申请的目的、技术方案及优点更加清楚明白,以下结合附图及实施例,对本申请进行进一步详细说明。应当理解,此处描述的具体实施例仅仅用以解释本申请,并不用于限定本申请。In order to make the purpose, technical solutions, and advantages of this application clearer and clearer, the following further describes the application in detail with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are only used to explain the present application, and are not used to limit the present application.
本申请提供的用户地理画像生成方法,可以应用于如图1所示的应用环境中。其中,终端102通过网络与服务器104通过网络进行通信。终端102将业务用户的定位服务数据发送至服务器104,服务器104通过基于密度的聚类算法对获得的定位服务数据进行密度聚类处理,再从得到的数据簇中,确定业务用户的地理画像基准位置所属的基准位置簇,再基于基准位置簇进行聚类处理得到的簇中心和定位服务数据,生成业务用户的用户地理画像。此外,业务用户的定位服务数据可以存储在服务器104的本地缓存中,则服务器104可以直接从本地缓存中获得业务用户的定位服务数据,以进行后续用户地理画像生成处理;还可以由终端102直接对业务用户的定位服务数据进行用户地理画像生成处理。其中,终端102可以但不限于是各种个人计算机、笔记本电脑、智能手机、平板电脑和便携式可穿戴设备,服务器104可以用独立的服务器或者是多个服务器组成的服务器集群来实现。The user geographic portrait generation method provided in this application can be applied to the application environment as shown in FIG. 1. Wherein, the terminal 102 communicates with the server 104 through the network through the network. The terminal 102 sends the location service data of the business user to the server 104, and the server 104 performs density clustering processing on the obtained location service data through a density-based clustering algorithm, and then determines the geographic portrait reference of the business user from the obtained data cluster The reference location cluster to which the location belongs is based on the cluster center and location service data obtained by clustering the reference location cluster to generate the user geographic portrait of the business user. In addition, the location service data of the business user can be stored in the local cache of the server 104, and the server 104 can directly obtain the location service data of the business user from the local cache for subsequent user geographic portrait generation processing; it can also be directly used by the terminal 102 Perform user geographic portrait generation processing on the location service data of business users. The terminal 102 may be, but is not limited to, various personal computers, notebook computers, smart phones, tablet computers, and portable wearable devices. The server 104 may be implemented by an independent server or a server cluster composed of multiple servers.
在其中一个实施例中,如图2所示,提供了一种用户地理画像生成方法,以该方法应用于图1中的服务器或终端为例进行说明,包括以下步骤:In one of the embodiments, as shown in FIG. 2, a method for generating geographic portraits of users is provided. Taking the method applied to the server or terminal in FIG. 1 as an example, the method includes the following steps:
步骤S201:获取业务用户的定位服务数据。Step S201: Obtain location service data of the business user.
其中,定位服务数据,即LBS数据由用户终端在使用位置服务时产生,如终端应用程序需要进行定位、导航时,通过电信移动运营商的无线电通讯网络,如GSM网(Global System for Mobile Communications,全球移动通讯系统)、CDMA网(Code Division Multiple Access,码分多址)、LTE网(Long Term Evolution,长期演进)或5G(5th-Generation,第五代移动通信技术),或通过外部定位方式,如GPS(Global Positioning System,全球定位系统)获得的移动终端的位置信息。一般地,根据业务用户对应移动终端的需求,进行定时或实时定位时,会产生众多的LBS数据。Among them, location service data, that is, LBS data, is generated when the user terminal uses location services. For example, when the terminal application needs to perform positioning and navigation, it passes through the radio communication network of the telecommunications mobile operator, such as the GSM network (Global System for Mobile Communications, Global Mobile Communication System), CDMA network (Code Division Multiple Access), LTE network (Long Term Evolution) or 5G (5th-Generation, the fifth-generation mobile communication technology), or through external positioning methods , Such as the location information of the mobile terminal obtained by GPS (Global Positioning System, Global Positioning System). Generally, according to the needs of service users corresponding to mobile terminals, when performing timing or real-time positioning, a large number of LBS data will be generated.
步骤S203:通过基于密度的聚类算法,对定位服务数据进行密度聚类处理,得到定 位服务数据的数据簇。Step S203: Perform density clustering processing on the location service data through the density-based clustering algorithm to obtain a data cluster of the location service data.
其中,基于密度的聚类算法是根据数据的密度分布来进行聚类,具体如DBSCAN(Density-Based Spatial Clustering of Applications with Noise,具有噪声的基于密度的空间聚类)算法、OPTICS(Ordering points to identify the clustering structure,对象排序识别聚类结构)算法等。本实施例中,通过基于密度的聚类算法,对定位服务数据进行密度聚类处理,将定位服务数据聚类至各簇中,得到定位服务数据的数据簇。其中,数据簇为通过基于密度的聚类算法对定位服务数据进行处理后,将定位服务数据进行聚类得到的相同类的数据点群。Among them, the density-based clustering algorithm is based on the density distribution of the data to perform clustering, such as DBSCAN (Density-Based Spatial Clustering of Applications with Noise, density-based spatial clustering with noise) algorithm, OPTICS (Ordering points to) identify the clustering structure, object sorting and identifying clustering structure) algorithms, etc. In this embodiment, a density-based clustering algorithm is used to perform density clustering processing on the location service data, and the location service data is clustered into clusters to obtain data clusters of the location service data. Among them, the data cluster is a cluster of data points of the same type obtained by clustering the location service data after processing the location service data through a density-based clustering algorithm.
步骤S205:从数据簇中,确定业务用户的地理画像基准位置所属的基准位置簇;其中,地理画像基准位置包括进行用户地理画像生成时的参考位置。Step S205: Determine the reference position cluster to which the reference position of the geographic portrait of the business user belongs from the data cluster; wherein the reference position of the geographic portrait includes the reference position when generating the geographic portrait of the user.
其中,地理画像基准位置为进行用户地理画像生成所参考的基准位置数据,具体可以包括进行用户地理画像生成时的参考位置,如确定用户地理画像中工作城市的标签时,需要参考的基准位置数据为用户的工作单位地址,而对于通勤距离的标签,需要参考的基准位置数据则包括用户的工作单位地址和家庭地址,地理画像基准位置根据用户地理画像的实际需求设定,如可以为业务用户的家庭地址、工作单位地址等。基准位置簇则为地理画像基准位置,即进行用户地理画像生成时的参考位置聚类后对应所属的数据簇,即基准位置簇为业务用户的地理画像基准位置所在的数据簇。在确定地理画像基准位置所属的基准位置簇时,可以根据数据簇中各数据点的统计结果确定。例如,地理画像基准位置包括家庭地址时,可以根据数据簇中各数据点的白天/晚上比例,确定业务用户的家庭地址所在的数据簇,从而从数据簇中确定基准位置簇。Among them, the geographic portrait reference location is the reference location data referenced for generating the user's geographic portrait, which can specifically include the reference location when generating the user's geographic portrait, such as the reference location data that needs to be referred to when determining the label of the working city in the user's geographic portrait. It is the user’s work address, and for the commute distance tag, the reference location data that needs to be referred to include the user’s work address and home address. The geographic portrait reference location is set according to the actual needs of the user’s geographic portrait, such as business users Home address, work address, etc. The reference location cluster is the geographic portrait reference location, that is, the data cluster to which the reference location cluster when generating the user's geographic portrait is performed, that is, the reference location cluster is the data cluster where the geographic portrait reference location of the business user is located. When determining the reference position cluster to which the reference position of the geographic portrait belongs, it can be determined according to the statistical result of each data point in the data cluster. For example, when the reference location of the geographic portrait includes the home address, the data cluster where the home address of the business user is located can be determined according to the day/night ratio of each data point in the data cluster, so as to determine the reference location cluster from the data cluster.
步骤S207:对基准位置簇进行聚类处理,得到基准位置簇的簇中心。Step S207: Perform clustering processing on the reference position cluster to obtain the cluster center of the reference position cluster.
得到地理画像基准位置所在的基准位置簇后,对该基准位置簇进行聚类处理,如可以通过基于均值的聚类算法,具体如通过K-means算法,设置K为1,对基准位置簇进行聚类处理,得到基准位置簇的簇中心,簇中心为业务用户的地理画像基准位置的实际定位坐标数据,即基准位置簇的簇中心即对应于业务用户的地理画像基准位置。在具体实现时,还可以基于预先设置的兴趣点,如小区POI、公司POI等,与该簇中心进行匹配,通过预设的兴趣点对簇中心的坐标进行修正,从而可以进一步提高簇中心位置的准确性。具体地,可以将与该簇中心最近的兴趣点对簇中心进行更新,并基于更新后的簇中心替代原来簇中心。After obtaining the reference position cluster where the reference position of the geographic portrait is located, perform clustering processing on the reference position cluster. For example, a clustering algorithm based on the mean can be used, for example, through the K-means algorithm, set K to 1, and perform clustering on the reference position cluster. The clustering process obtains the cluster center of the reference location cluster. The cluster center is the actual positioning coordinate data of the geographic portrait reference location of the business user, that is, the cluster center of the reference location cluster corresponds to the geographic portrait reference location of the business user. In specific implementation, you can also match the cluster center based on preset interest points, such as cell POI, company POI, etc., and modify the coordinates of the cluster center through the preset interest points, which can further improve the cluster center position Accuracy. Specifically, the cluster center can be updated with the interest point closest to the cluster center, and the original cluster center can be replaced based on the updated cluster center.
步骤S209:基于簇中心和定位服务数据,生成业务用户的用户地理画像。Step S209: Generate a user geographic portrait of the business user based on the cluster center and the location service data.
确定簇中心,即业务用户的地理画像基准位置的实际定位坐标数据后,基于簇中心和定位服务数据,生成业务用户的用户地理画像。其中,用户地理画像反映了业务用户的个人特征,具体可以由多个业务用户的地理标签组成,地理标签可以包括但不限于包括家庭位置、工作单位位置、通勤距离、工作城市、居住地城市、是否跨地工作和籍贯等。After determining the cluster center, that is, the actual positioning coordinate data of the reference position of the geographic portrait of the business user, the user geographic portrait of the business user is generated based on the cluster center and the positioning service data. Among them, the user geographic portrait reflects the personal characteristics of the business user, which can be specifically composed of geographic tags of multiple business users. The geographic tags may include, but are not limited to, home location, work location, commuting distance, work city, city of residence, Whether to work across places and hometown, etc.
上述用户地理画像生成方法中,基于大数据处理技术,通过基于密度的聚类算法对 海量的定位服务数据进行密度聚类处理,再从得到的数据簇中,确定业务用户的地理画像基准位置所属的基准位置簇,有效利用了定位服务数据的密度分布特点,确保了基准位置簇的准确性;再基于基准位置簇进行聚类处理得到的簇中心和定位服务数据,生成业务用户的用户地理画像,提高了用户地理画像的准确性。In the above-mentioned user geographic portrait generation method, based on big data processing technology, density clustering is performed on massive positioning service data through a density-based clustering algorithm, and then from the obtained data clusters, the geographic portrait reference position of the business user is determined The reference location clusters effectively utilize the density distribution characteristics of the location service data to ensure the accuracy of the reference location clusters; then cluster centers and location service data obtained by clustering the reference location clusters to generate user geographic portraits of business users , Improve the accuracy of the user's geographic portrait.
在其中一个实施例中,如图3所示,数据簇获取的处理,即通过基于密度的聚类算法,对定位服务数据进行密度聚类处理,得到定位服务数据的数据簇包括:In one of the embodiments, as shown in FIG. 3, the process of obtaining data clusters, that is, performing density clustering processing on location service data through a density-based clustering algorithm, to obtain data clusters of location service data includes:
步骤S301:获取预设的核心点覆盖半径和核心点覆盖数目阈值。Step S301: Obtain a preset core point coverage radius and a core point coverage number threshold.
本实施例中,通过DBSCAN算法对定位服务数据进行密度聚类处理,得到定位服务数据的数据簇。具体地,获取预设的核心点覆盖半径和核心点覆盖数目阈值,核心点覆盖半径和核心点覆盖数目阈值均根据实际聚类需求灵活设置。其中,核心点覆盖半径为聚类处理时核心点的覆盖范围,核心点覆盖数目阈值为该核心点最少覆盖的LBS数据点的数目,核心点定义为,与一LBS数据点的距离小于该核心点覆盖半径的其他LBS数据点的数目超过该核心点覆盖数目阈值。核心点覆盖半径越大,且核心点覆盖数目阈值越小,则定位服务数据中的核心点数目越多。In this embodiment, the location service data is subjected to density clustering processing through the DBSCAN algorithm to obtain a data cluster of the location service data. Specifically, the preset core point coverage radius and the core point coverage number threshold are acquired, and the core point coverage radius and the core point coverage number threshold are flexibly set according to actual clustering requirements. Among them, the core point coverage radius is the coverage area of the core point during the clustering process, and the core point coverage threshold is the number of LBS data points that the core point covers the least. The core point is defined as the distance from an LBS data point is less than the core point. The number of other LBS data points of the point coverage radius exceeds the core point coverage threshold. The larger the core point coverage radius, and the smaller the core point coverage number threshold, the greater the number of core points in the positioning service data.
在其中一个具体应用中,核心点覆盖半径设为500米,核心点覆盖数目阈值设为10,即在定位服务数据中,周围500米内有超过10个LBS数据点的LBS数据点,可以将其定义为核心点。In one of the specific applications, the core point coverage radius is set to 500 meters, and the core point coverage number threshold is set to 10. That is, in the positioning service data, there are more than 10 LBS data points within 500 meters of the LBS data point. Defined as the core point.
步骤S303:按照核心点覆盖半径和核心点覆盖数目阈值,通过DBSCAN算法对定位服务数据进行聚类迭代处理,得到定位服务数据的核心点。Step S303: According to the core point coverage radius and the core point coverage number threshold, the positioning service data is clustered and iteratively processed through the DBSCAN algorithm to obtain the core point of the positioning service data.
确定核心点覆盖半径和核心点覆盖数目阈值后,通过DBSCAN算法对所有定位服务数据进行聚类迭代处理,从定位服务数据中确定满足核心点覆盖半径和核心点覆盖数目阈值的核心点。其中,DBSCAN算法是一种基于密度的空间聚类算法,其将具有足够密度的区域划分为簇,并在具有噪声的空间数据库中发现任意形状的簇,它将簇定义为密度相连的点的最大集合。After determining the core point coverage radius and the core point coverage number threshold, the DBSCAN algorithm is used to cluster and iteratively process all the positioning service data, and the core points that meet the core point coverage radius and core point coverage number threshold are determined from the positioning service data. Among them, the DBSCAN algorithm is a density-based spatial clustering algorithm, which divides areas with sufficient density into clusters, and finds clusters of arbitrary shapes in a noisy spatial database. It defines clusters as the density of connected points. The largest collection.
步骤S305:对各核心点进行聚类迭代处理,得到由核心点组成的定位服务数据的数据簇。Step S305: Perform clustering iterative processing on each core point to obtain a data cluster of the positioning service data composed of the core points.
确定定位服务数据的各核心点后,进一步对各核心点进行聚类迭代处理,具体实现时可以继续采用DBSCAN算法进行聚类迭代处理,得到由核心点组成的定位服务数据的数据簇。数据簇由各核心点连接组成,数据簇覆盖的LBS数据点可以认为是同类的数据。After determining the core points of the positioning service data, the clustering and iterative processing is further performed on each core point. In specific implementation, the DBSCAN algorithm can be used for clustering iterative processing to obtain a data cluster of the positioning service data composed of the core points. The data cluster is composed of the connection of core points, and the LBS data points covered by the data cluster can be considered as the same kind of data.
在其中一个实施例中,对各核心点进行聚类迭代处理,得到由核心点组成的定位服务数据的数据簇包括:获取预设的核心点组合距离阈值;及按照核心点组合距离阈值,通过DBSCAN算法对各核心点进行聚类迭代处理,得到由核心点组成的定位服务数据的数据簇。In one of the embodiments, performing clustering iterative processing on each core point to obtain a data cluster of positioning service data composed of core points includes: obtaining a preset core point combination distance threshold; and according to the core point combination distance threshold, passing The DBSCAN algorithm performs clustering iterative processing on each core point, and obtains a data cluster of positioning service data composed of core points.
本实施例中,继续通过DBSCAN算法对各核心点进行聚类迭代处理,得到由核心点组成的定位服务数据的数据簇。具体地,得到定位服务数据的核心点后,获取预设的核心 点组合距离阈值,核心点组合距离阈值根据数据簇的大小需求设定,核心点组合距离阈值为核心点是否连接组成数据簇的条件,若两个核心点之间的距离小于该核心点组合距离阈值,则将该两个核心点组合,得到数据簇。核心点组合距离阈值越大,则得到的数据簇连接的核心点越多,其覆盖的LBS数据点也越多。得到核心点组合距离阈值后,按照核心点组合距离阈值,通过DBSCAN算法对各核心点进行聚类迭代处理,得到由核心点组成的定位服务数据的数据簇。对于一个数据簇中的任一核心点,至少有一个其他核心点与该核心点的距离小于核心点组合距离阈值。In this embodiment, the clustering and iterative processing of each core point is continued through the DBSCAN algorithm, and a data cluster of positioning service data composed of the core points is obtained. Specifically, after the core point of the positioning service data is obtained, the preset core point combination distance threshold is obtained. The core point combination distance threshold is set according to the size requirements of the data cluster. The core point combination distance threshold is whether the core points are connected to form the data cluster. Condition, if the distance between two core points is less than the core point combination distance threshold, then the two core points are combined to obtain a data cluster. The greater the core point combination distance threshold, the more core points connected to the obtained data cluster, and the more LBS data points it covers. After obtaining the core point combination distance threshold, according to the core point combination distance threshold, clustering and iterative processing is performed on each core point through the DBSCAN algorithm to obtain a data cluster of positioning service data composed of core points. For any core point in a data cluster, the distance between at least one other core point and the core point is less than the core point combined distance threshold.
在其中一个具体应用中,核心点组合距离阈值为500米,即对于一核心点,若在其周围500米范围内,有其他核心点,则将该核心点与其他核心点连接,组成数据簇。In one of the specific applications, the core point combination distance threshold is 500 meters, that is, for a core point, if there are other core points within 500 meters around it, the core point is connected with other core points to form a data cluster .
在其中一个实施例中,地理画像基准位置包括家庭地址和工作地址;从数据簇中,确定业务用户的地理画像基准位置所属的基准位置簇包括:确定数据簇中定位服务数据的数目及定位服务数据的时段分布;根据数据簇中定位服务数据的数目和时段分布,确定家庭地址所属的家庭地址簇和工作地址所属的工作地址簇;及根据家庭地址簇和工作地址簇得到基准位置簇。In one of the embodiments, the reference location of the geographic portrait includes home address and work address; from the data cluster, determining the reference location cluster to which the reference location of the geographic portrait of the business user belongs includes: determining the number of location service data in the data cluster and the location service Time distribution of data; determine the home address cluster to which the home address belongs and the working address cluster to which the work address belongs according to the number and time distribution of location service data in the data cluster; and obtain the reference location cluster according to the home address cluster and the working address cluster.
本实施例中,地理画像基准位置包括家庭地址和工作地址,即基于业务用户的家庭地址和工作地址进行用户地理画像分析生成,具体根据数据簇中LBS数据的数目和时段分布,从数据簇中确定基准位置簇。In this embodiment, the reference location of the geographic portrait includes the home address and work address, that is, the user geographic portrait analysis is generated based on the home address and work address of the business user, specifically according to the number of LBS data in the data cluster and the time period distribution, from the data cluster Determine the reference position cluster.
具体地,确定业务用户的地理画像基准位置所属的基准位置簇时,对数据簇覆盖的定位服务数据进行统计分析,确定定位服务数据的数目及定位服务数据的时段分布。其中,时段分布可以为但不限于为白天/晚上、工作日/非工作日等。基于数据簇中定位服务数据的数目和时段分布,确定家庭地址所属的家庭地址簇和工作地址所属的工作地址簇,具体可以分析数据簇中定位服务数据的不同时段占比,如白天与晚上的占比,从而确定家庭地址所属的家庭地址簇和工作地址所属的工作地址簇。基准位置簇可以由该家庭地址簇和工作地址簇组成。Specifically, when determining the reference location cluster to which the reference location of the geographic portrait of the business user belongs, statistical analysis is performed on the location service data covered by the data cluster, and the number of location service data and the time period distribution of the location service data are determined. Wherein, the time period distribution can be, but is not limited to, day/night, working day/non-working day, etc. Based on the number and time distribution of the location service data in the data cluster, determine the home address cluster to which the home address belongs and the work address cluster to which the work address belongs. Specifically, the proportion of the location service data in the data cluster can be analyzed in different time periods, such as day and night. To determine the home address cluster to which the home address belongs and the working address cluster to which the work address belongs. The reference location cluster can be composed of the home address cluster and the work address cluster.
在其中一个具体应用中,得到定位服务数据的数据簇的数目为n,即得到n个数据簇。若n=1,则统计各数据簇内白天和晚上点数,若数据簇内时段分布为白天>晚上,则该数据簇为工作地址所在的基准位置簇,若数据簇内时段分布为白天<晚上,则该数据簇为家庭地址所在的基准位置簇。In one of the specific applications, the number of data clusters for obtaining location service data is n, that is, n data clusters are obtained. If n=1, count the day and night points in each data cluster. If the time period distribution in the data cluster is day>night, then the data cluster is the reference location cluster where the work address is located. If the time period distribution in the data cluster is day<night , The data cluster is the reference location cluster where the home address is located.
若n≥2,根据定位服务数据的总点数/n计算平均值avg。对于任意一数据簇,若数据簇总点数≥avg,根据白天点数/簇总点数计算白天点数占比,根据晚上点数/簇总点数计算晚上点数占比;若数据簇总点数<avg,根据(白天点数/avg)*(白天点数/簇总点数)计算白天点数占比,根据(晚上点数/avg)*(晚上点数/簇总点数)计算晚上点数占比。分别计算得到各数据簇的白天点数占比和晚上点数占比后,比较n个簇白天点数占比和晚上点数占比。若白天点数占比最高和晚上点数占比最高在不同簇,则白天占比最高的簇为工作地址所在的簇,晚上为家庭地址所在的簇。若白天点数占比最高或晚上点数占比最高 在同一簇,且n个簇都只有一个时间段(都记为白天),选出占比最高的一个簇的时间段作为家庭地址所在的簇或工作地址所在的簇(最后只形成一个簇,即白天的簇)。若白天点数占比最高或晚上点数占比最高在同一簇,且n个簇只有一个时间段(记为白天),另一个簇的白天占比>此n簇,则在此n簇中选择白天占比最高的簇作为白天的簇,另一个簇则成为晚上的簇,从而确定家庭地址簇和工作地址簇。对于其他情形,对于一数据簇,若白天占比>晚上占比,则确定该数据簇为家庭地址簇;若白天占比<晚上占比,则确定该数据簇为工作地址簇;若白天占比=晚上占比,则随机判定家庭地址簇和工作地址簇,若一个数据簇已经选定了一个簇类型,记为白天,即工作地址簇,则在另n个簇中选择晚上占比最高的簇作为晚上的簇,即家庭地址簇。If n≥2, calculate the average avg based on the total number of points/n of the location service data. For any data cluster, if the total number of data cluster points is greater than or equal to avg, the proportion of daytime points is calculated according to the number of daytime points/the total number of cluster points, and the proportion of night points is calculated according to the number of nights/total cluster points; if the total number of data cluster points is less than avg, according to ( Daytime points/avg)*(daytime points/total cluster points) calculate the proportion of daytime points, and calculate the proportion of night points according to (night points/avg)*(night points/total cluster points). After calculating the percentages of daytime points and night points for each data cluster, compare the percentages of daytime points and night points for n clusters. If the highest percentage of points during the day and the highest percentage of points at night are in different clusters, the cluster with the highest percentage during the day is the cluster where the work address is located, and the cluster where the home address is at night is the cluster with the highest percentage during the day. If the highest proportion of points in the daytime or the highest proportion of points in the evening is in the same cluster, and n clusters have only one time period (all recorded as daytime), the time period of the cluster with the highest proportion is selected as the cluster or the home address The cluster where the work address is located (only one cluster is formed at the end, that is, the cluster in the daytime). If the highest proportion of points in the daytime or the highest proportion of points in the night is in the same cluster, and n clusters have only one time period (recorded as daytime), and the daytime proportion of another cluster>this n cluster, then choose the daytime among the n clusters The cluster with the highest proportion is regarded as the cluster during the day, and the other cluster becomes the cluster at night, so as to determine the home address cluster and the work address cluster. In other cases, for a data cluster, if the proportion in the daytime>the proportion in the night, the data cluster is determined to be a home address cluster; if the proportion in the daytime<the proportion in the night, the data cluster is determined to be a work address cluster; if the proportion in the daytime is Ratio = night proportion, then randomly determine the home address cluster and work address cluster. If a cluster type has been selected for a data cluster, it is recorded as daytime, that is, the work address cluster, and the highest proportion of night is selected among the other n clusters The cluster of is regarded as the cluster at night, that is, the home address cluster.
在其中一个实施例中,还包括:当数据簇的数目为0时,基于定位服务数据,生成业务用户的用户地理画像。In one of the embodiments, it further includes: when the number of data clusters is 0, generating a user geographic portrait of the business user based on the positioning service data.
本实施例中,当通过基于密度的聚类算法,对定位服务数据进行密度聚类处理后,没有得到数据簇,即数据簇的数目为0时,直接基于定位服务数据进行统计分析,定位服务数据。例如,可以找到每个LBS数据点所在城市,统计天数+点数(优先比较天数)多者为所在城市,最后确定业务用户的地理标签,具体如业务用户的所在城市、过年城市、所到城市列表等,进一步根据该地理标签生成业务用户的用户地理画像。In this embodiment, when the location service data is subjected to density clustering processing through the density-based clustering algorithm, no data clusters are obtained, that is, when the number of data clusters is 0, statistical analysis is performed directly based on the location service data, and the location service data. For example, you can find the city where each LBS data point is located, the number of statistical days + the number of points (first comparison days) is the city where you are, and finally the geographic tag of the business user is determined, such as the city where the business user is located, the New Year city, and the list of cities visited And so on, further generate the user geographic portrait of the business user based on the geographic tag.
在其中一个实施例中,业务用户的用户地理画像包括:家庭位置、工作单位位置、通勤距离、工作城市、居住地城市、是否跨地工作、籍贯、是否外来务工、节假日常去城市、是否周末宅、是否有房和工作性质中的至少一种。In one of the embodiments, the user geographic portrait of the business user includes: home location, work location, commuting distance, working city, residential city, whether to work across regions, hometown, whether to work outside, cities frequented on holidays, and whether on weekends At least one of the house, whether there is a house, and the nature of work.
其中,家庭位置、工作单位位置可以根据家庭地址簇和工作地址的簇中心确定;通勤距离可以根据家庭位置和工作单位位置之间的距离计算得到;工作城市可以根据工作单位位置确定;居住地城市可以根据家庭位置确定;是否跨地工作可以根据工作城市和居住地城市之间的对应关系确定;籍贯可以根据春节期间的LBS数据分布确定;是否外来务工可以根据籍贯和工作城市的对应关系确定;节假日常去城市可以根据节假日常去城市的LBS数据分布确定;是否周末宅可以根据周末的LBS数据分布确定,具体的,如周末的LBS数据是否超过家庭位置的一定范围,如超过一定距离认为这一天外出,否在为宅,如果宅的天数超过外出的天数,则认为是周末宅;是否有房可以根据一定时间如三年内,家庭位置的变化情况确定;工作性质可以包括差旅、加班、夜班等,具体可以在工作日出现在非工作城市的数量超过一定值,如在外工作的天数超过总工作天数的20%时,认为为差旅的工作性质,若晚上7点到12点在工作地的LBS数据点的数量超过一定值,如占总打点数30%,则认为工作性质为加班;若晚上12点到凌晨7点在工作地的LBS数据点的数量超过一定值,如占总打点数50%,则认为为夜班的工作性质。通过丰富的地理标签,可以得到对应业务用户饱满的用户地理画像,从而确保能够对应提供高质量的业务服务。Among them, the home location and work location can be determined based on the cluster center of the home address cluster and the work address; the commuting distance can be calculated based on the distance between the home location and the location of the work unit; the working city can be determined based on the location of the work unit; the city of residence It can be determined according to the location of the family; whether to work across places can be determined according to the correspondence between the city of work and the city of residence; the hometown can be determined according to the distribution of LBS data during the Spring Festival; whether migrant workers can be determined according to the correspondence between the hometown and the city of work; Cities frequented on holidays can be determined according to the distribution of LBS data of cities frequently visited on holidays; whether a weekend house can be determined according to the distribution of LBS data on weekends. Specifically, whether the LBS data on weekends exceeds a certain range of the home location, if it exceeds a certain distance, it is considered this If you go out for a day, whether you are staying at home or not, if the number of days at home exceeds the number of days you are away, it is considered a weekend home; whether there is a room can be determined according to the changes in the location of the family within a certain period of time, such as three years; the nature of work can include travel, overtime, Night shifts, etc., can be specific when the number of non-working cities exceeds a certain value at the beginning of work. For example, when the number of working days exceeds 20% of the total working days, it is considered to be the nature of travel work. If you are working from 7 to 12 in the evening The number of LBS data points in the place exceeds a certain value. If it accounts for 30% of the total number of points, the work is considered to be overtime; if the number of LBS data points at the work place exceeds a certain value from 12pm to 7am, if it accounts for the total 50% of the credit is considered to be the nature of night shift work. Through rich geographic tags, a full user geographic portrait of the corresponding business users can be obtained, so as to ensure the corresponding provision of high-quality business services.
在其中一个实施例中,在获取业务用户的定位服务数据之后,还包括:从定位服务数据中提取区域外坐标;当确定区域外坐标为颠倒坐标时,对区域外坐标进行经纬度置换 处理,得到置换处理后的置换坐标;及将置换坐标添加至定位服务数据中,将更新后的定位服务数据作为定位服务数据。In one of the embodiments, after obtaining the positioning service data of the business user, it further includes: extracting the out-of-area coordinates from the positioning service data; when it is determined that the out-of-area coordinates are inverted coordinates, performing latitude and longitude replacement processing on the out-of-area coordinates to obtain Replace the replacement coordinates after replacement processing; and add the replacement coordinates to the location service data, and use the updated location service data as the location service data.
本实施例中,对于获取的定位服务数据中发生经纬度颠倒的颠倒坐标,对其进行经纬度置换处理,将经纬度置换,得到置换坐标,从而对发生经纬度颠倒错误的数据进行了一定程度的修正,确保了定位服务数据的准确性,从而提高了用户地理画像的准确性。In this embodiment, the latitude and longitude inverted coordinates in the acquired positioning service data are replaced by the latitude and longitude replacement process, and the latitude and longitude are replaced to obtain the replacement coordinates, thereby correcting the data with the latitude and longitude inversion error to a certain extent to ensure This improves the accuracy of location service data, thereby improving the accuracy of user geographic portraits.
具体地,区域外坐标为处于感兴趣数据区域范围的定位服务数据,感兴趣数据区域范围根据针对LBS数据的数据挖掘需求确定。例如,对于只适用于特定场所的数据挖掘,具体如仅对中国境内的LBS数据进行数据挖掘的应用场景,则感兴趣数据区域范围为中国境内,而对于非中国境内的LBS数据排除。在从定位服务数据中提取区域外坐标时,可以确定感兴趣数据区域范围,并根据各定位服务数据的位置,确定不处于该感兴趣数据区域范围内的区域外坐标。区域外坐标可以包括经纬度坐标信息。Specifically, the out-of-area coordinates are positioning service data in the range of the data area of interest, and the range of the data area of interest is determined according to the data mining requirements for the LBS data. For example, for data mining that is only applicable to specific locations, such as an application scenario that only performs data mining on LBS data in China, the data area of interest is within China, and LBS data outside of China is excluded. When extracting the out-of-area coordinates from the location service data, the range of the data area of interest can be determined, and the out-of-area coordinates that are not within the range of the data area of interest can be determined according to the location of each location service data. The out-of-area coordinates may include latitude and longitude coordinate information.
当确定区域外坐标为颠倒坐标时,如可以基于该区域外坐标的前后LBS数据判断该区域外坐标是否为发生经纬度颠倒的颠倒坐标,若是,则对区域外坐标进行经纬度置换处理,得到置换处理后的置换坐标;若判断该区域外坐标不是颠倒坐标,则表明该区域外坐标为真实的区域外坐标,则不属于数据挖掘的感兴趣数据,将其排除。对经纬度发生颠倒的定位服务数据进行置换处理后,将得到的置换坐标添加至定位服务数据中,得到更新后的定位服务数据,从而对颠倒的错误数据进行了修正。When it is determined that the coordinates outside the area are inverted coordinates, for example, based on the LBS data of the coordinates outside the area, it can be judged whether the coordinates outside the area are the inverted coordinates where the latitude and longitude are reversed. If so, perform the latitude and longitude replacement processing on the coordinates outside the area to obtain the replacement processing. If it is judged that the coordinates outside the area are not inverted coordinates, it means that the coordinates outside the area are real coordinates outside the area, and they are not the data of interest for data mining, so they are excluded. After performing replacement processing on the positioning service data with the latitude and longitude reversed, the obtained replacement coordinates are added to the positioning service data to obtain the updated positioning service data, thereby correcting the inverted error data.
应该理解的是,虽然图2-3的流程图中的各个步骤按照箭头的指示依次显示,但是这些步骤并不是必然按照箭头指示的顺序依次执行。除非本文中有明确的说明,这些步骤的执行并没有严格的顺序限制,这些步骤可以以其它的顺序执行。而且,图2-3中的至少一部分步骤可以包括多个子步骤或者多个阶段,这些子步骤或者阶段并不必然是在同一时刻执行完成,而是可以在不同的时刻执行,这些子步骤或者阶段的执行顺序也不必然是依次进行,而是可以与其它步骤或者其它步骤的子步骤或者阶段的至少一部分轮流或者交替地执行。It should be understood that, although the various steps in the flowchart of FIGS. 2-3 are displayed in sequence as indicated by the arrows, these steps are not necessarily performed in sequence in the order indicated by the arrows. Unless specifically stated in this article, the execution of these steps is not strictly limited in order, and these steps can be executed in other orders. Moreover, at least some of the steps in Figure 2-3 may include multiple sub-steps or multiple stages. These sub-steps or stages are not necessarily executed at the same time, but can be executed at different times. These sub-steps or stages The execution order of is not necessarily performed sequentially, but may be performed alternately or alternately with at least a part of other steps or sub-steps or stages of other steps.
在其中一个实施例中,如图4所示,提供了一种用户地理画像生成装置,包括:用户数据获取模块401、数据簇获得模块403、基准位置簇确定模块405、簇中心确定模块407和地理画像生产模块409,其中:In one of the embodiments, as shown in FIG. 4, a device for generating geographic portraits of users is provided, including: a user data acquisition module 401, a data cluster acquisition module 403, a reference position cluster determination module 405, a cluster center determination module 407, and Geographical portrait production module 409, of which:
用户数据获取模块401,用于获取业务用户的定位服务数据;The user data obtaining module 401 is used to obtain location service data of business users;
数据簇获得模块403,用于通过基于密度的聚类算法,对定位服务数据进行密度聚类处理,得到定位服务数据的数据簇;The data cluster obtaining module 403 is used to perform density clustering processing on the location service data through a density-based clustering algorithm to obtain a data cluster of the location service data;
基准位置簇确定模块405,用于从数据簇中,确定业务用户的地理画像基准位置所属的基准位置簇;其中,地理画像基准位置包括进行用户地理画像生成时的参考位置;The reference position cluster determining module 405 is used to determine the reference position cluster to which the reference position of the geographic portrait of the business user belongs from the data cluster; wherein, the reference position of the geographic portrait includes the reference position when generating the geographic portrait of the user;
簇中心确定模块407,用于对基准位置簇进行聚类处理,得到基准位置簇的簇中心;及The cluster center determining module 407 is used to perform clustering processing on the reference position cluster to obtain the cluster center of the reference position cluster; and
地理画像生产模块409,用于基于簇中心和定位服务数据,生成业务用户的用户地理 画像。The geographic portrait production module 409 is used to generate user geographic portraits of business users based on the cluster center and positioning service data.
在其中一个实施例中,数据簇获得模块403包括核心点条件单元、核心点确定单元和数据簇确定单元;其中:核心点条件单元,用于获取预设的核心点覆盖半径和核心点覆盖数目阈值;核心点确定单元,用于按照核心点覆盖半径和核心点覆盖数目阈值,通过DBSCAN算法对定位服务数据进行聚类迭代处理,得到定位服务数据的核心点;及数据簇确定单元,用于对各核心点进行聚类迭代处理,得到由核心点组成的定位服务数据的数据簇。In one of the embodiments, the data cluster obtaining module 403 includes a core point condition unit, a core point determination unit, and a data cluster determination unit; among them: the core point condition unit is used to obtain a preset core point coverage radius and core point coverage number Threshold; core point determination unit, used to cluster and iteratively process the positioning service data through the DBSCAN algorithm according to the core point coverage radius and the core point coverage number threshold to obtain the core point of the positioning service data; and the data cluster determination unit for Perform clustering iterative processing on each core point to obtain a data cluster of positioning service data composed of core points.
在其中一个实施例中,数据簇确定单元包括组合阈值子单元和核心点组合子单元;其中:组合阈值子单元,用于获取预设的核心点组合距离阈值;及核心点组合子单元,用于按照核心点组合距离阈值,通过DBSCAN算法对各核心点进行聚类迭代处理,得到由核心点组成的定位服务数据的数据簇。In one of the embodiments, the data cluster determination unit includes a combination threshold subunit and a core point combination subunit; wherein: the combination threshold subunit is used to obtain a preset core point combination distance threshold; and the core point combination subunit is used In accordance with the core point combination distance threshold, the DBSCAN algorithm is used to cluster and iteratively process each core point to obtain a data cluster of positioning service data composed of core points.
在其中一个实施例中,地理画像基准位置包括家庭地址和工作地址;基准位置簇确定模块405包括数据簇分析单元、家庭工作地址簇子单元和基准位置簇子单元;其中:数据簇分析单元,用于确定数据簇中定位服务数据的数目及定位服务数据的时段分布;家庭工作地址簇子单元,用于根据数据簇中定位服务数据的数目和时段分布,确定家庭地址所属的家庭地址簇和工作地址所属的工作地址簇;及基准位置簇子单元,用于根据家庭地址簇和工作地址簇得到基准位置簇。In one of the embodiments, the reference location of the geographic portrait includes a home address and a work address; the reference location cluster determination module 405 includes a data cluster analysis unit, a home work address cluster subunit, and a reference location cluster subunit; among them: a data cluster analysis unit, Used to determine the number of location service data in the data cluster and the time distribution of the location service data; the home work address cluster subunit is used to determine the home address cluster and the home address cluster to which the home address belongs according to the number and time distribution of the location service data in the data cluster The working address cluster to which the working address belongs; and the reference location cluster subunit, which is used to obtain the reference location cluster according to the home address cluster and the working address cluster.
在其中一个实施例中,还包括无簇处理模块,用于当数据簇的数目为0时,基于定位服务数据,生成业务用户的用户地理画像。In one of the embodiments, a clusterless processing module is further included, which is used to generate a user geographic portrait of the business user based on the positioning service data when the number of data clusters is zero.
在其中一个实施例中,业务用户的用户地理画像包括:家庭位置、工作单位位置、通勤距离、工作城市、居住地城市、是否跨地工作、籍贯、是否外来务工、节假日常去城市、是否周末宅、是否有房和工作性质中的至少一种。In one of the embodiments, the user geographic portrait of the business user includes: home location, work location, commuting distance, working city, residential city, whether to work across regions, hometown, whether to work outside, cities frequented on holidays, and whether on weekends At least one of the house, whether there is a house, and the nature of work.
在其中一个实施例中,还包括区域外坐标模块、置换处理模块和数据更新模块;其中:区域外坐标模块,用于从定位服务数据中提取区域外坐标;置换处理模块,用于当确定区域外坐标为颠倒坐标时,对区域外坐标进行经纬度置换处理,得到置换处理后的置换坐标;及数据更新模块,用于将置换坐标添加至定位服务数据中,将更新后的定位服务数据作为定位服务数据。In one of the embodiments, it further includes an out-of-area coordinate module, a replacement processing module, and a data update module; wherein: the out-of-area coordinate module is used to extract the out-of-area coordinates from the positioning service data; the replacement processing module is used to determine the area When the external coordinates are inverted coordinates, perform the latitude and longitude replacement processing on the coordinates outside the area to obtain the replacement coordinates after the replacement processing; and a data update module for adding the replacement coordinates to the positioning service data, and use the updated positioning service data as the positioning Service data.
关于用户地理画像生成装置的具体限定可以参见上文中对于用户地理画像生成方法的限定,在此不再赘述。上述用户地理画像生成装置中的各个模块可全部或部分通过软件、硬件及其组合来实现。上述各模块可以硬件形式内嵌于或独立于计算机设备中的处理器中,也可以以软件形式存储于计算机设备中的存储器中,以便于处理器调用执行以上各个模块对应的操作。Regarding the specific limitation of the user geographic portrait generating device, please refer to the above limitation on the user geographic portrait generating method, which will not be repeated here. Each module in the above-mentioned user geographic portrait generating device can be implemented in whole or in part by software, hardware, and a combination thereof. The above-mentioned modules may be embedded in the form of hardware or independent of the processor in the computer equipment, or may be stored in the memory of the computer equipment in the form of software, so that the processor can call and execute the operations corresponding to the above-mentioned modules.
在其中一个实施例中,提供了一种计算机设备,该计算机设备可以是服务器或终端,其内部结构图可以如图5所示。该计算机设备包括通过系统总线连接的处理器、存储器、网络接口和数据库。其中,该计算机设备的处理器用于提供计算和控制能力。该计算机设 备的存储器包括非易失性或易失性存储介质、内存储器。该非易失性或易失性存储介质存储有操作系统、计算机可读指令和数据库。该内存储器为非易失性或易失性存储介质中的操作系统和计算机可读指令的运行提供环境。该计算机设备的数据库用于存储数据。该计算机设备的网络接口用于与外部的终端通过网络连接通信。该计算机可读指令被处理器执行时以实现一种用户地理画像生成方法。In one of the embodiments, a computer device is provided. The computer device may be a server or a terminal, and its internal structure diagram may be as shown in FIG. 5. The computer equipment includes a processor, a memory, a network interface, and a database connected through a system bus. Among them, the processor of the computer device is used to provide calculation and control capabilities. The memory of the computer device includes a non-volatile or volatile storage medium and internal memory. The non-volatile or volatile storage medium stores an operating system, computer readable instructions, and a database. The internal memory provides an environment for the operation of the operating system and computer-readable instructions in the non-volatile or volatile storage medium. The database of the computer equipment is used to store data. The network interface of the computer device is used to communicate with an external terminal through a network connection. When the computer readable instructions are executed by the processor, a method for generating geographic portraits of users is realized.
本领域技术人员可以理解,图5中示出的结构,仅仅是与本申请方案相关的部分结构的框图,并不构成对本申请方案所应用于其上的计算机设备的限定,具体的计算机设备可以包括比图中所示更多或更少的部件,或者组合某些部件,或者具有不同的部件布置。Those skilled in the art can understand that the structure shown in FIG. 5 is only a block diagram of part of the structure related to the solution of the present application, and does not constitute a limitation on the computer device to which the solution of the present application is applied. The specific computer device may Including more or fewer parts than shown in the figure, or combining some parts, or having a different arrangement of parts.
一种计算机设备,包括存储器和一个或多个处理器,存储器中储存有计算机可读指令,计算机可读指令被处理器执行时,使得一个或多个处理器执行以下步骤:A computer device includes a memory and one or more processors. The memory stores computer-readable instructions. When the computer-readable instructions are executed by the processor, the one or more processors execute the following steps:
获取业务用户的定位服务数据;Obtain location service data of business users;
通过基于密度的聚类算法,对定位服务数据进行密度聚类处理,得到定位服务数据的数据簇;Through density-based clustering algorithm, the location service data is subjected to density clustering processing to obtain the data cluster of the location service data;
从数据簇中,确定业务用户的地理画像基准位置所属的基准位置簇;其中,地理画像基准位置包括进行用户地理画像生成时的参考位置;From the data cluster, determine the reference position cluster to which the reference position of the geographic portrait of the business user belongs; where the reference position of the geographic portrait includes the reference position when generating the user's geographic portrait;
对基准位置簇进行聚类处理,得到基准位置簇的簇中心;及Perform clustering processing on the reference position cluster to obtain the cluster center of the reference position cluster; and
基于簇中心和定位服务数据,生成业务用户的用户地理画像。Based on the cluster center and location service data, generate user geographic portraits of business users.
在其中一个实施例中,处理器执行计算机可读指令时还实现以下步骤:获取预设的核心点覆盖半径和核心点覆盖数目阈值;按照核心点覆盖半径和核心点覆盖数目阈值,通过DBSCAN算法对定位服务数据进行聚类迭代处理,得到定位服务数据的核心点;及对各核心点进行聚类迭代处理,得到由核心点组成的定位服务数据的数据簇。In one of the embodiments, the processor further implements the following steps when executing the computer-readable instructions: obtaining the preset core point coverage radius and core point coverage number threshold; according to the core point coverage radius and core point coverage number threshold, the DBSCAN algorithm is used Perform clustering iterative processing on the positioning service data to obtain the core points of the positioning service data; and perform clustering iterative processing on each core point to obtain a data cluster of the positioning service data composed of the core points.
在其中一个实施例中,处理器执行计算机可读指令时还实现以下步骤:获取预设的核心点组合距离阈值;及按照核心点组合距离阈值,通过DBSCAN算法对各核心点进行聚类迭代处理,得到由核心点组成的定位服务数据的数据簇。In one of the embodiments, the processor further implements the following steps when executing the computer-readable instructions: obtaining a preset core point combination distance threshold; and according to the core point combination distance threshold, perform clustering iterative processing on each core point through the DBSCAN algorithm , Get the data cluster of location service data composed of core points.
在其中一个实施例中,地理画像基准位置包括家庭地址和工作地址;处理器执行计算机可读指令时还实现以下步骤:确定数据簇中定位服务数据的数目及定位服务数据的时段分布;根据数据簇中定位服务数据的数目和时段分布,确定家庭地址所属的家庭地址簇和工作地址所属的工作地址簇;及根据家庭地址簇和工作地址簇得到基准位置簇。In one of the embodiments, the reference location of the geographic portrait includes the home address and the work address; the processor also implements the following steps when executing the computer-readable instructions: determining the number of positioning service data in the data cluster and the time distribution of the positioning service data; according to the data The number and time distribution of the location service data in the cluster determine the home address cluster to which the home address belongs and the working address cluster to which the work address belongs; and the reference location cluster is obtained according to the home address cluster and the working address cluster.
在其中一个实施例中,处理器执行计算机可读指令时还实现以下步骤:当数据簇的数目为0时,基于定位服务数据,生成业务用户的用户地理画像。In one of the embodiments, the processor further implements the following steps when executing the computer-readable instructions: when the number of data clusters is 0, generate a user geographic portrait of the business user based on the location service data.
在其中一个实施例中,业务用户的用户地理画像包括:家庭位置、工作单位位置、通勤距离、工作城市、居住地城市、是否跨地工作、籍贯、是否外来务工、节假日常去城市、是否周末宅、是否有房和工作性质中的至少一种。In one of the embodiments, the user geographic portrait of the business user includes: home location, workplace location, commuting distance, working city, residential city, whether to work across regions, hometown, whether to work outside, cities frequently visited on holidays, and whether on weekends At least one of the house, whether there is a house, and the nature of work.
在其中一个实施例中,处理器执行计算机可读指令时还实现以下步骤:从定位服务数据中提取区域外坐标;当确定区域外坐标为颠倒坐标时,对区域外坐标进行经纬度置换 处理,得到置换处理后的置换坐标;及将置换坐标添加至定位服务数据中,将更新后的定位服务数据作为定位服务数据。In one of the embodiments, the processor further implements the following steps when executing the computer-readable instructions: extracting out-of-area coordinates from the positioning service data; when determining that the out-of-area coordinates are inverted coordinates, perform latitude and longitude replacement processing on the out-of-area coordinates to obtain Replace the replacement coordinates after replacement processing; and add the replacement coordinates to the location service data, and use the updated location service data as the location service data.
一个或多个存储有计算机可读指令的计算机可读非易失性存储介质,计算机可读指令被一个或多个处理器执行时,使得一个或多个处理器执行以下步骤:One or more computer-readable non-volatile storage media storing computer-readable instructions. When the computer-readable instructions are executed by one or more processors, the one or more processors perform the following steps:
获取业务用户的定位服务数据;Obtain location service data of business users;
通过基于密度的聚类算法,对定位服务数据进行密度聚类处理,得到定位服务数据的数据簇;Through density-based clustering algorithm, the location service data is subjected to density clustering processing to obtain the data cluster of the location service data;
从数据簇中,确定业务用户的地理画像基准位置所属的基准位置簇;其中,地理画像基准位置包括进行用户地理画像生成时的参考位置;From the data cluster, determine the reference position cluster to which the reference position of the geographic portrait of the business user belongs; where the reference position of the geographic portrait includes the reference position when generating the geographic portrait of the user;
对基准位置簇进行聚类处理,得到基准位置簇的簇中心;及Perform clustering processing on the reference position cluster to obtain the cluster center of the reference position cluster; and
基于簇中心和定位服务数据,生成业务用户的用户地理画像。Based on the cluster center and location service data, generate user geographic portraits of business users.
其中,该计算机可读存储介质可以是非易失性,也可以是易失性的。Wherein, the computer-readable storage medium may be non-volatile or volatile.
在其中一个实施例中,计算机可读指令被处理器执行时还实现以下步骤:获取预设的核心点覆盖半径和核心点覆盖数目阈值;按照核心点覆盖半径和核心点覆盖数目阈值,通过DBSCAN算法对定位服务数据进行聚类迭代处理,得到定位服务数据的核心点;及对各核心点进行聚类迭代处理,得到由核心点组成的定位服务数据的数据簇。In one of the embodiments, when the computer-readable instructions are executed by the processor, the following steps are also implemented: obtaining preset core point coverage radius and core point coverage number threshold; according to the core point coverage radius and core point coverage threshold value, pass DBSCAN The algorithm performs clustering iterative processing on the positioning service data to obtain the core points of the positioning service data; and performs clustering iterative processing on each core point to obtain a data cluster of the positioning service data composed of core points.
在其中一个实施例中,计算机可读指令被处理器执行时还实现以下步骤:获取预设的核心点组合距离阈值;及按照核心点组合距离阈值,通过DBSCAN算法对各核心点进行聚类迭代处理,得到由核心点组成的定位服务数据的数据簇。In one of the embodiments, when the computer-readable instructions are executed by the processor, the following steps are also implemented: obtaining a preset core point combination distance threshold; and according to the core point combination distance threshold, clustering iterations of each core point through the DBSCAN algorithm After processing, a data cluster of positioning service data composed of core points is obtained.
在其中一个实施例中,地理画像基准位置包括家庭地址和工作地址;计算机可读指令被处理器执行时还实现以下步骤:确定数据簇中定位服务数据的数目及定位服务数据的时段分布;根据数据簇中定位服务数据的数目和时段分布,确定家庭地址所属的家庭地址簇和工作地址所属的工作地址簇;及根据家庭地址簇和工作地址簇得到基准位置簇。In one of the embodiments, the reference location of the geographic portrait includes the home address and the work address; when the computer-readable instructions are executed by the processor, the following steps are also implemented: determining the number of location service data in the data cluster and the time period distribution of the location service data; The number and time distribution of the location service data in the data cluster determine the home address cluster to which the home address belongs and the working address cluster to which the work address belongs; and the reference location cluster is obtained according to the home address cluster and the working address cluster.
在其中一个实施例中,计算机可读指令被处理器执行时还实现以下步骤:当数据簇的数目为0时,基于定位服务数据,生成业务用户的用户地理画像。In one of the embodiments, when the computer-readable instructions are executed by the processor, the following steps are further implemented: when the number of data clusters is zero, a user geographic portrait of the business user is generated based on the location service data.
在其中一个实施例中,业务用户的用户地理画像包括:家庭位置、工作单位位置、通勤距离、工作城市、居住地城市、是否跨地工作、籍贯、是否外来务工、节假日常去城市、是否周末宅、是否有房和工作性质中的至少一种。In one of the embodiments, the user geographic portrait of the business user includes: home location, work location, commuting distance, working city, residential city, whether to work across regions, hometown, whether to work outside, cities frequented on holidays, and whether on weekends At least one of the house, whether there is a house, and the nature of work.
在其中一个实施例中,计算机可读指令被处理器执行时还实现以下步骤:从定位服务数据中提取区域外坐标;当确定区域外坐标为颠倒坐标时,对区域外坐标进行经纬度置换处理,得到置换处理后的置换坐标;及将置换坐标添加至定位服务数据中,将更新后的定位服务数据作为定位服务数据。In one of the embodiments, when the computer-readable instructions are executed by the processor, the following steps are also implemented: extracting out-of-area coordinates from the positioning service data; when it is determined that the out-of-area coordinates are inverted coordinates, performing latitude and longitude replacement processing on the out-of-area coordinates, Obtain the replacement coordinates after replacement processing; and add the replacement coordinates to the location service data, and use the updated location service data as the location service data.
本领域普通技术人员可以理解实现上述实施例方法中的全部或部分流程,是可以通过计算机可读指令来指令相关的硬件来完成,所述的计算机可读指令可存储于一计算机可读取存储介质中,该计算机可读指令在执行时,可包括如上述各方法的实施例的流程。其 中,本申请所提供的各实施例中所使用的对存储器、存储、数据库或其它介质的任何引用,均可包括非易失性和/或易失性存储器。非易失性存储器可包括只读存储器(ROM)、可编程ROM(PROM)、电可编程ROM(EPROM)、电可擦除可编程ROM(EEPROM)或闪存。易失性存储器可包括随机存取存储器(RAM)或者外部高速缓冲存储器。作为说明而非局限,RAM以多种形式可得,诸如静态RAM(SRAM)、动态RAM(DRAM)、同步DRAM(SDRAM)、双数据率SDRAM(DDRSDRAM)、增强型SDRAM(ESDRAM)、同步链路(Synchlink)DRAM(SLDRAM)、存储器总线(Rambus)直接RAM(RDRAM)、直接存储器总线动态RAM(DRDRAM)、以及存储器总线动态RAM(RDRAM)等。A person of ordinary skill in the art can understand that all or part of the processes in the above-mentioned embodiment methods can be implemented by instructing relevant hardware through computer-readable instructions. The computer-readable instructions can be stored in a computer-readable storage. In the medium, when the computer-readable instructions are executed, they may include the processes of the above-mentioned method embodiments. Among them, any reference to memory, storage, database or other media used in the embodiments provided in this application may include non-volatile and/or volatile memory. Non-volatile memory may include read only memory (ROM), programmable ROM (PROM), electrically programmable ROM (EPROM), electrically erasable programmable ROM (EEPROM), or flash memory. Volatile memory may include random access memory (RAM) or external cache memory. As an illustration and not a limitation, RAM is available in many forms, such as static RAM (SRAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), double data rate SDRAM (DDRSDRAM), enhanced SDRAM (ESDRAM), synchronous chain Channel (Synchlink) DRAM (SLDRAM), memory bus (Rambus) direct RAM (RDRAM), direct memory bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM), etc.
以上实施例的各技术特征可以进行任意的组合,为使描述简洁,未对上述实施例中的各个技术特征所有可能的组合都进行描述,然而,只要这些技术特征的组合不存在矛盾,都应当认为是本说明书记载的范围。The technical features of the above embodiments can be combined arbitrarily. In order to make the description concise, all possible combinations of the technical features in the above embodiments are not described. However, as long as there is no contradiction in the combination of these technical features, they should be It is considered as the range described in this specification.
以上所述实施例仅表达了本申请的几种实施方式,其描述较为具体和详细,但并不能因此而理解为对发明专利范围的限制。应当指出的是,对于本领域的普通技术人员来说,在不脱离本申请构思的前提下,还可以做出若干变形和改进,这些都属于本申请的保护范围。因此,本申请专利的保护范围应以所附权利要求为准。The above-mentioned embodiments only express several implementation manners of the present application, and the description is relatively specific and detailed, but it should not be understood as a limitation on the scope of the invention patent. It should be pointed out that for those of ordinary skill in the art, without departing from the concept of this application, several modifications and improvements can be made, and these all fall within the protection scope of this application. Therefore, the scope of protection of the patent of this application shall be subject to the appended claims.

Claims (20)

  1. 一种用户地理画像生成方法,包括:A method for generating user geographic portraits, including:
    获取业务用户的定位服务数据;Obtain location service data of business users;
    通过基于密度的聚类算法,对所述定位服务数据进行密度聚类处理,得到所述定位服务数据的数据簇;Performing density clustering processing on the location service data by using a density-based clustering algorithm to obtain a data cluster of the location service data;
    从所述数据簇中,确定所述业务用户的地理画像基准位置所属的基准位置簇;其中,所述地理画像基准位置包括进行用户地理画像生成时的参考位置;From the data cluster, determine the reference position cluster to which the reference position of the geographic portrait of the business user belongs; wherein the reference position of the geographic portrait includes the reference position when the geographic portrait of the user is generated;
    对所述基准位置簇进行聚类处理,得到所述基准位置簇的簇中心;及Performing clustering processing on the reference position cluster to obtain the cluster center of the reference position cluster; and
    基于所述簇中心和所述定位服务数据,生成所述业务用户的用户地理画像。Based on the cluster center and the positioning service data, a user geographic portrait of the business user is generated.
  2. 根据权利要求1所述的方法,其中,所述通过基于密度的聚类算法,对所述定位服务数据进行密度聚类处理,得到所述定位服务数据的数据簇,包括:The method according to claim 1, wherein the performing density clustering processing on the location service data through a density-based clustering algorithm to obtain a data cluster of the location service data comprises:
    获取预设的核心点覆盖半径和核心点覆盖数目阈值;Obtain preset core point coverage radius and core point coverage number threshold;
    按照所述核心点覆盖半径和所述核心点覆盖数目阈值,通过DBSCAN算法对所述定位服务数据进行聚类迭代处理,得到所述定位服务数据的核心点;及According to the core point coverage radius and the core point coverage number threshold, perform clustering iterative processing on the positioning service data through the DBSCAN algorithm to obtain the core points of the positioning service data; and
    对各所述核心点进行聚类迭代处理,得到由所述核心点组成的所述定位服务数据的数据簇。Perform clustering iterative processing on each of the core points to obtain a data cluster of the positioning service data composed of the core points.
  3. 根据权利要求2所述的方法,其中,所述对各所述核心点进行聚类迭代处理,得到由所述核心点组成的所述定位服务数据的数据簇,包括:The method according to claim 2, wherein said performing clustering iterative processing on each of said core points to obtain a data cluster of said positioning service data composed of said core points comprises:
    获取预设的核心点组合距离阈值;及Obtain the preset core point combination distance threshold; and
    按照所述核心点组合距离阈值,通过DBSCAN算法对各所述核心点进行聚类迭代处理,得到由所述核心点组成的所述定位服务数据的数据簇。According to the core point combination distance threshold, clustering and iterative processing is performed on each of the core points through the DBSCAN algorithm to obtain the data cluster of the positioning service data composed of the core points.
  4. 根据权利要求1所述的方法,其中,所述地理画像基准位置包括家庭地址和工作地址;所述从所述数据簇中,确定所述业务用户的地理画像基准位置所属的基准位置簇,包括:The method according to claim 1, wherein the geographic portrait reference position includes a home address and a work address; and the determining the reference position cluster to which the geographic portrait reference position of the business user belongs from the data cluster includes :
    确定所述数据簇中所述定位服务数据的数目及所述定位服务数据的时段分布;Determining the number of the positioning service data in the data cluster and the time period distribution of the positioning service data;
    根据所述数据簇中所述定位服务数据的数目和所述时段分布,确定家庭地址所属的家庭地址簇和工作地址所属的工作地址簇;及Determine the home address cluster to which the home address belongs and the working address cluster to which the work address belongs according to the number of the location service data in the data cluster and the time period distribution; and
    根据所述家庭地址簇和所述工作地址簇得到基准位置簇。A reference location cluster is obtained according to the home address cluster and the work address cluster.
  5. 根据权利要求4所述的方法,其中,所述根据所述数据簇中所述定位服务数据的数目和所述时段分布,确定家庭地址所属的家庭地址簇和工作地址所属的工作地址簇,包括:The method according to claim 4, wherein the determining the home address cluster to which the home address belongs and the working address cluster to which the work address belongs according to the number of the positioning service data in the data cluster and the time period distribution comprises :
    根据所述数据簇中所述定位服务数据的数目和所述时段分布,确定白天时段和晚上时段分别对应定位服务数据的数目的占比;及According to the number of the location service data in the data cluster and the time period distribution, determine the proportion of the number of location service data corresponding to the day time period and the night time period respectively; and
    根据所述占比确定家庭地址所属的家庭地址簇和工作地址所属的工作地址簇。The home address cluster to which the home address belongs and the working address cluster to which the work address belongs are determined according to the proportion.
  6. 根据权利要求1所述的方法,其中,还包括:The method according to claim 1, further comprising:
    当所述数据簇的数目为0时,基于所述定位服务数据,生成所述业务用户的用户地理画像。When the number of the data clusters is 0, the user geographic portrait of the business user is generated based on the positioning service data.
  7. 根据权利要求1至6任意一项所述的方法,其中,所述业务用户的用户地理画像包括:家庭位置、工作单位位置、通勤距离、工作城市、居住地城市、是否跨地工作、籍贯、是否外来务工、节假日常去城市、是否周末宅、是否有房和工作性质中的至少一种。The method according to any one of claims 1 to 6, wherein the user geographic portrait of the business user includes: home location, work unit location, commuting distance, working city, residential city, whether to work across places, hometown, At least one of migrant workers, frequent visits to the city on holidays, weekend homes, availability of houses, and the nature of work.
  8. 根据权利要求1至6任意一项所述的方法,其中,在所述获取业务用户的定位服务数据之后,所述方法还包括:The method according to any one of claims 1 to 6, wherein, after said obtaining the location service data of the business user, the method further comprises:
    从所述定位服务数据中提取区域外坐标;Extracting out-of-area coordinates from the positioning service data;
    当确定所述区域外坐标为颠倒坐标时,对所述区域外坐标进行经纬度置换处理,得到置换处理后的置换坐标;及When it is determined that the coordinates outside the area are inverted coordinates, perform latitude and longitude replacement processing on the coordinates outside the area to obtain the replacement coordinates after the replacement processing; and
    将所述置换坐标添加至所述定位服务数据中,将更新后的定位服务数据作为所述定位服务数据。The replacement coordinates are added to the location service data, and the updated location service data is used as the location service data.
  9. 根据权利要求8所述的方法,其中,在所述当确定所述区域外坐标为颠倒坐标时,对所述区域外坐标进行经纬度置换处理,得到置换处理后的置换坐标之前,所述方法还包括:The method according to claim 8, wherein, when it is determined that the coordinates outside the area are inverted coordinates, the coordinates outside the area are subjected to latitude and longitude replacement processing to obtain the replacement coordinates after the replacement processing, the method further include:
    基于所述区域外坐标前后的数据定位服务数据判断所述区域外坐标是否为颠倒坐标。Based on the data positioning service data before and after the outside coordinates, it is determined whether the outside coordinates are inverted coordinates.
  10. 一种用户地理画像生成装置,其中,包括:A device for generating geographic portraits of users, which includes:
    用户数据获取模块,用于获取业务用户的定位服务数据;User data acquisition module for acquiring location service data of business users;
    数据簇获得模块,用于通过基于密度的聚类算法,对所述定位服务数据进行密度聚类处理,得到所述定位服务数据的数据簇;A data cluster obtaining module, configured to perform density clustering processing on the location service data through a density-based clustering algorithm to obtain a data cluster of the location service data;
    基准位置簇确定模块,用于从所述数据簇中,确定所述业务用户的地理画像基准位置所属的基准位置簇;其中,所述地理画像基准位置包括进行用户地理画像生成时的参考位置;The reference position cluster determining module is used to determine the reference position cluster to which the reference position of the geographic portrait of the business user belongs from the data cluster; wherein the reference position of the geographic portrait includes the reference position when generating the geographic portrait of the user;
    簇中心确定模块,用于对所述基准位置簇进行聚类处理,得到所述基准位置簇的簇中心;及A cluster center determination module, configured to perform clustering processing on the reference position cluster to obtain the cluster center of the reference position cluster; and
    地理画像生产模块,用于基于所述簇中心和所述定位服务数据,生成所述业务用户的用户地理画像。The geographic portrait production module is used to generate the user geographic portrait of the business user based on the cluster center and the positioning service data.
  11. 一种计算机设备,包括存储器及一个或多个处理器,所述存储器中储存有计算机可读指令,所述计算机可读指令被所述一个或多个处理器执行时,使得所述一个或多个处理器执行以下步骤:A computer device includes a memory and one or more processors. The memory stores computer-readable instructions. When the computer-readable instructions are executed by the one or more processors, the one or more Each processor performs the following steps:
    获取业务用户的定位服务数据;Obtain location service data of business users;
    通过基于密度的聚类算法,对所述定位服务数据进行密度聚类处理,得到所述定位服务数据的数据簇;Performing density clustering processing on the location service data by using a density-based clustering algorithm to obtain a data cluster of the location service data;
    从所述数据簇中,确定所述业务用户的地理画像基准位置所属的基准位置簇;其中,所述地理画像基准位置包括进行用户地理画像生成时的参考位置;From the data cluster, determine the reference position cluster to which the reference position of the geographic portrait of the business user belongs; wherein the reference position of the geographic portrait includes the reference position when the geographic portrait of the user is generated;
    对所述基准位置簇进行聚类处理,得到所述基准位置簇的簇中心;及Performing clustering processing on the reference position cluster to obtain the cluster center of the reference position cluster; and
    基于所述簇中心和所述定位服务数据,生成所述业务用户的用户地理画像。Based on the cluster center and the positioning service data, a user geographic portrait of the business user is generated.
  12. 根据权利要求11所述的计算机设备,其中,所述处理器执行所述计算机可读指令时还执行以下步骤:The computer device according to claim 11, wherein the processor further executes the following steps when executing the computer-readable instructions:
    获取预设的核心点覆盖半径和核心点覆盖数目阈值;Obtain preset core point coverage radius and core point coverage number threshold;
    按照所述核心点覆盖半径和所述核心点覆盖数目阈值,通过DBSCAN算法对所述定位服务数据进行聚类迭代处理,得到所述定位服务数据的核心点;及According to the core point coverage radius and the core point coverage number threshold, perform clustering iterative processing on the positioning service data through the DBSCAN algorithm to obtain the core points of the positioning service data; and
    对各所述核心点进行聚类迭代处理,得到由所述核心点组成的所述定位服务数据的数据簇。Perform clustering iterative processing on each of the core points to obtain a data cluster of the positioning service data composed of the core points.
  13. 根据权利要求12所述的计算机设备,其中,所述处理器执行所述计算机可读指令时还执行以下步骤:The computer device according to claim 12, wherein the processor further executes the following steps when executing the computer readable instruction:
    获取预设的核心点组合距离阈值;及Obtain the preset core point combination distance threshold; and
    按照所述核心点组合距离阈值,通过DBSCAN算法对各所述核心点进行聚类迭代处理,得到由所述核心点组成的所述定位服务数据的数据簇。According to the core point combination distance threshold, clustering and iterative processing is performed on each of the core points through the DBSCAN algorithm to obtain the data cluster of the positioning service data composed of the core points.
  14. 根据权利要求11所述的计算机设备,其中,所述地理画像基准位置包括家庭地址和工作地址;所述处理器执行所述计算机可读指令时还执行以下步骤:The computer device according to claim 11, wherein the reference location of the geographic portrait includes a home address and a work address; the processor further executes the following steps when executing the computer-readable instruction:
    确定所述数据簇中所述定位服务数据的数目及所述定位服务数据的时段分布;Determining the number of the positioning service data in the data cluster and the time period distribution of the positioning service data;
    根据所述数据簇中所述定位服务数据的数目和所述时段分布,确定家庭地址所属的家庭地址簇和工作地址所属的工作地址簇;及Determine the home address cluster to which the home address belongs and the working address cluster to which the work address belongs according to the number of the location service data in the data cluster and the time period distribution; and
    根据所述家庭地址簇和所述工作地址簇得到基准位置簇。A reference location cluster is obtained according to the home address cluster and the work address cluster.
  15. 根据权利要求14所述的计算机设备,其中,所述处理器执行所述计算机可读指令时还执行以下步骤:The computer device according to claim 14, wherein the processor further executes the following steps when executing the computer-readable instructions:
    根据所述数据簇中所述定位服务数据的数目和所述时段分布,确定白天时段和晚上时段分别对应定位服务数据的数目的占比;及According to the number of the location service data in the data cluster and the time period distribution, determine the proportion of the number of location service data corresponding to the day time period and the night time period respectively; and
    根据所述占比确定家庭地址所属的家庭地址簇和工作地址所属的工作地址簇。The home address cluster to which the home address belongs and the working address cluster to which the work address belongs are determined according to the proportion.
  16. 一个或多个存储有计算机可读指令的计算机可读存储介质,所述计算机可读指令被一个或多个处理器执行时,使得所述一个或多个处理器执行以下步骤:One or more computer-readable storage media storing computer-readable instructions, which when executed by one or more processors, cause the one or more processors to perform the following steps:
    获取业务用户的定位服务数据;Obtain location service data of business users;
    通过基于密度的聚类算法,对所述定位服务数据进行密度聚类处理,得到所述定位服务数据的数据簇;Performing density clustering processing on the location service data by using a density-based clustering algorithm to obtain a data cluster of the location service data;
    从所述数据簇中,确定所述业务用户的地理画像基准位置所属的基准位置簇;其中,所述地理画像基准位置包括进行用户地理画像生成时的参考位置;From the data cluster, determine the reference position cluster to which the reference position of the geographic portrait of the business user belongs; wherein the reference position of the geographic portrait includes the reference position when the geographic portrait of the user is generated;
    对所述基准位置簇进行聚类处理,得到所述基准位置簇的簇中心;及Performing clustering processing on the reference position cluster to obtain the cluster center of the reference position cluster; and
    基于所述簇中心和所述定位服务数据,生成所述业务用户的用户地理画像。Based on the cluster center and the positioning service data, a user geographic portrait of the business user is generated.
  17. 根据权利要求16所述的存储介质,其中,所述计算机可读指令被所述处理器执 行时还执行以下步骤:The storage medium according to claim 16, wherein the following steps are further performed when the computer-readable instructions are executed by the processor:
    获取预设的核心点覆盖半径和核心点覆盖数目阈值;Obtain preset core point coverage radius and core point coverage number threshold;
    按照所述核心点覆盖半径和所述核心点覆盖数目阈值,通过DBSCAN算法对所述定位服务数据进行聚类迭代处理,得到所述定位服务数据的核心点;及According to the core point coverage radius and the core point coverage number threshold, perform clustering iterative processing on the positioning service data through the DBSCAN algorithm to obtain the core points of the positioning service data; and
    对各所述核心点进行聚类迭代处理,得到由所述核心点组成的所述定位服务数据的数据簇。Perform clustering iterative processing on each of the core points to obtain a data cluster of the positioning service data composed of the core points.
  18. 根据权利要求17所述的存储介质,其中,所述计算机可读指令被所述处理器执行时还执行以下步骤:The storage medium according to claim 17, wherein the following steps are further performed when the computer-readable instructions are executed by the processor:
    获取预设的核心点组合距离阈值;及Obtain the preset core point combination distance threshold; and
    按照所述核心点组合距离阈值,通过DBSCAN算法对各所述核心点进行聚类迭代处理,得到由所述核心点组成的所述定位服务数据的数据簇。According to the core point combination distance threshold, clustering and iterative processing is performed on each of the core points through the DBSCAN algorithm to obtain the data cluster of the positioning service data composed of the core points.
  19. 根据权利要求16所述的存储介质,其中,所述地理画像基准位置包括家庭地址和工作地址;所述计算机可读指令被所述处理器执行时还执行以下步骤:The storage medium according to claim 16, wherein the reference location of the geographic portrait includes a home address and a work address; and the following steps are further performed when the computer-readable instructions are executed by the processor:
    确定所述数据簇中所述定位服务数据的数目及所述定位服务数据的时段分布;Determining the number of the positioning service data in the data cluster and the time period distribution of the positioning service data;
    根据所述数据簇中所述定位服务数据的数目和所述时段分布,确定家庭地址所属的家庭地址簇和工作地址所属的工作地址簇;及Determine the home address cluster to which the home address belongs and the working address cluster to which the work address belongs according to the number of the location service data in the data cluster and the time period distribution; and
    根据所述家庭地址簇和所述工作地址簇得到基准位置簇。A reference location cluster is obtained according to the home address cluster and the work address cluster.
  20. 根据权利要求19所述的存储介质,其中,所述计算机可读指令被所述处理器执行时还执行以下步骤:The storage medium according to claim 19, wherein the following steps are further performed when the computer-readable instructions are executed by the processor:
    根据所述数据簇中所述定位服务数据的数目和所述时段分布,确定白天时段和晚上时段分别对应定位服务数据的数目的占比;及According to the number of the location service data in the data cluster and the time period distribution, determine the proportion of the number of location service data corresponding to the day time period and the night time period respectively; and
    根据所述占比确定家庭地址所属的家庭地址簇和工作地址所属的工作地址簇。The home address cluster to which the home address belongs and the working address cluster to which the work address belongs are determined according to the proportion.
PCT/CN2020/105506 2019-11-26 2020-07-29 Method and apparatus for generating user geographical portrait, computer device, and storage medium WO2021103626A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201911173407.3A CN111178932A (en) 2019-11-26 2019-11-26 User geographic portrait generation method and device, computer equipment and storage medium
CN201911173407.3 2019-11-26

Publications (1)

Publication Number Publication Date
WO2021103626A1 true WO2021103626A1 (en) 2021-06-03

Family

ID=70655376

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/105506 WO2021103626A1 (en) 2019-11-26 2020-07-29 Method and apparatus for generating user geographical portrait, computer device, and storage medium

Country Status (2)

Country Link
CN (1) CN111178932A (en)
WO (1) WO2021103626A1 (en)

Families Citing this family (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111178932A (en) * 2019-11-26 2020-05-19 深圳壹账通智能科技有限公司 User geographic portrait generation method and device, computer equipment and storage medium
CN111698332A (en) * 2020-06-23 2020-09-22 深圳壹账通智能科技有限公司 Method, device and equipment for distributing business objects and storage medium
CN111813831B (en) * 2020-07-10 2022-10-14 中国铁塔股份有限公司厦门市分公司 Method, device and readable medium for presuming substation to which communication base station belongs
CN112215580B (en) * 2020-10-23 2024-02-06 岭东核电有限公司 Nuclear power operation area setting method and device, computer equipment and storage medium
CN112231392A (en) * 2020-10-29 2021-01-15 广东机场白云信息科技有限公司 Civil aviation customer source data analysis method, electronic equipment and computer readable storage medium
CN112417273B (en) * 2020-11-17 2022-04-19 平安科技(深圳)有限公司 Region portrait image generation method, region portrait image generation device, computer equipment and storage medium
TWI776379B (en) * 2021-01-28 2022-09-01 中華電信股份有限公司 Device, method and computer readable medium for feature mining

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20110157220A1 (en) * 2009-12-24 2011-06-30 Geum river water system conservancy System and method for drawing stream and road centerline for GIS-based linear map production
CN106651603A (en) * 2016-12-29 2017-05-10 平安科技(深圳)有限公司 Risk evaluation method and apparatus based on position service
CN107818506A (en) * 2017-09-30 2018-03-20 上海壹账通金融科技有限公司 Electronic installation, credit risk control method and storage medium
CN107818116A (en) * 2016-09-14 2018-03-20 上海掌门科技有限公司 For determining the method and apparatus of user behavior zone position information
CN108376155A (en) * 2018-02-07 2018-08-07 链家网(北京)科技有限公司 A kind of geographical location information determines method and device
CN109918581A (en) * 2019-03-06 2019-06-21 上海评驾科技有限公司 A kind of more results of the more points of interest of user based on space-time data know method for distinguishing
CN109918582A (en) * 2019-03-06 2019-06-21 上海评驾科技有限公司 A kind of user's list point of interest knowledge method for distinguishing based on space-time data
CN109919225A (en) * 2019-03-06 2019-06-21 上海评驾科技有限公司 A kind of user interest point knowledge method for distinguishing based on space-time data
CN111178932A (en) * 2019-11-26 2020-05-19 深圳壹账通智能科技有限公司 User geographic portrait generation method and device, computer equipment and storage medium

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105718465B (en) * 2014-12-02 2019-04-09 阿里巴巴集团控股有限公司 Geography fence generation method and device
CN109829020B (en) * 2018-12-20 2023-04-07 平安科技(深圳)有限公司 Method and device for pushing place resource data, computer equipment and storage medium
CN109635070B (en) * 2019-01-18 2020-11-17 上海迹寻科技有限公司 Method for constructing user interest portrait based on action track and data updating method thereof
CN110263825B (en) * 2019-05-30 2022-05-10 湖南大学 Data clustering method and device, computer equipment and storage medium

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20110157220A1 (en) * 2009-12-24 2011-06-30 Geum river water system conservancy System and method for drawing stream and road centerline for GIS-based linear map production
CN107818116A (en) * 2016-09-14 2018-03-20 上海掌门科技有限公司 For determining the method and apparatus of user behavior zone position information
CN106651603A (en) * 2016-12-29 2017-05-10 平安科技(深圳)有限公司 Risk evaluation method and apparatus based on position service
CN107818506A (en) * 2017-09-30 2018-03-20 上海壹账通金融科技有限公司 Electronic installation, credit risk control method and storage medium
CN108376155A (en) * 2018-02-07 2018-08-07 链家网(北京)科技有限公司 A kind of geographical location information determines method and device
CN109918581A (en) * 2019-03-06 2019-06-21 上海评驾科技有限公司 A kind of more results of the more points of interest of user based on space-time data know method for distinguishing
CN109918582A (en) * 2019-03-06 2019-06-21 上海评驾科技有限公司 A kind of user's list point of interest knowledge method for distinguishing based on space-time data
CN109919225A (en) * 2019-03-06 2019-06-21 上海评驾科技有限公司 A kind of user interest point knowledge method for distinguishing based on space-time data
CN111178932A (en) * 2019-11-26 2020-05-19 深圳壹账通智能科技有限公司 User geographic portrait generation method and device, computer equipment and storage medium

Also Published As

Publication number Publication date
CN111178932A (en) 2020-05-19

Similar Documents

Publication Publication Date Title
WO2021103626A1 (en) Method and apparatus for generating user geographical portrait, computer device, and storage medium
US20230403530A1 (en) Determining a significant user location for providing location-based services
AU2014275203B2 (en) Modeling significant locations
US10242116B2 (en) Grid-based geofence data indexing
CN108271120B (en) Method, device and equipment for determining target area and target user
US20150031397A1 (en) Address Point Data Mining
TWI709353B (en) Method for determining positioning interval of mobile terminal, mobile terminal and server
WO2021043064A1 (en) Community detection method and apparatus, and computer device and storage medium
US10630799B2 (en) Method and apparatus for pushing information
CN110569321B (en) Grid division processing method and device based on urban map and computer equipment
CN109561390B (en) Method and device for determining public praise scene coverage cell
EP3425530A1 (en) Target location search method and apparatus
CN111061766A (en) Business data processing method and device, computer equipment and storage medium
CN111311193B (en) Method and device for configuring public service resources
CN113468226A (en) Service processing method, device, electronic equipment and storage medium
CN110532437B (en) Electronic certificate prompting method, electronic certificate prompting device, computer equipment and storage medium
CN112465197B (en) Regional population quantity prediction method, regional population quantity prediction device, computer equipment and storage medium
CN112699196B (en) Track generation method, track generation device, terminal equipment and storage medium
US9949069B2 (en) Population estimation apparatus, program and population estimation method
CN111143639B (en) User intimacy calculation method, device, equipment and medium
CN111382165A (en) Mobile homeland management system
CN111611337B (en) Terminal data processing system
CN112445880A (en) Method and device for automatically gridding enterprise data in geographic space and related equipment
CN111797181A (en) Method and device for positioning user position, control equipment and storage medium
EP2487936A1 (en) Methods and apparatus for location categorisation through continuous location

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20891681

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 20891681

Country of ref document: EP

Kind code of ref document: A1

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 21.09.2022)

122 Ep: pct application non-entry in european phase

Ref document number: 20891681

Country of ref document: EP

Kind code of ref document: A1