CN113961780A - Resident cell acquisition method and device, electronic equipment and storage medium - Google Patents

Resident cell acquisition method and device, electronic equipment and storage medium Download PDF

Info

Publication number
CN113961780A
CN113961780A CN202111279958.5A CN202111279958A CN113961780A CN 113961780 A CN113961780 A CN 113961780A CN 202111279958 A CN202111279958 A CN 202111279958A CN 113961780 A CN113961780 A CN 113961780A
Authority
CN
China
Prior art keywords
cell
position information
clustering
cells
distributed
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202111279958.5A
Other languages
Chinese (zh)
Inventor
朱泽亚
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
China Telecom Corp Ltd
Original Assignee
China Telecom Corp Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by China Telecom Corp Ltd filed Critical China Telecom Corp Ltd
Priority to CN202111279958.5A priority Critical patent/CN113961780A/en
Publication of CN113961780A publication Critical patent/CN113961780A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/906Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/29Geographical information databases
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/907Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • G06F16/909Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using geographical or spatial information, e.g. location

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Library & Information Science (AREA)
  • Remote Sensing (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The embodiment of the application discloses a resident cell acquisition method, a resident cell acquisition device, electronic equipment and a storage medium, wherein the method comprises the following steps: acquiring a plurality of geographical position information of a user in a preset time period, wherein the geographical position information comprises longitude and latitude coordinates and a timestamp; determining a plurality of cells with distributed geographic position information; clustering the plurality of geographic position information by using a density-based clustering algorithm model to obtain a clustering result, wherein neighborhood parameter values contained in the density-based clustering model are in direct proportion to the areas of cells distributed by the plurality of geographic position information; and determining resident cells of the users according to the clustering results and the cells distributed with the plurality of geographical position information. The technical scheme of the embodiment of the application can improve the accuracy of the acquired user resident cell.

Description

Resident cell acquisition method and device, electronic equipment and storage medium
Technical Field
The present application relates to the field of communications technologies, and in particular, to a method, an apparatus, an electronic device, and a storage medium for acquiring a residential cell.
Background
With the continuous development of financial technology (Fintech), especially internet technology and finance, more and more technologies (such as artificial intelligence, big data, cloud storage and the like) are applied to the financial field, but the financial field also puts higher requirements on various technologies, such as the requirement on accurate analysis of a user resident address. The user stationary point (i.e. the stationary point of a user) refers to a place where the frequency of the user appearing exceeds a certain threshold value within a period of time. For example, the server may obtain a terminal identifier of the mobile terminal where the user is located, such as an international Mobile Equipment Identity (MEI), and then determine, according to the terminal identifier, a longitude and a latitude where the mobile terminal is currently located, perform clustering on the longitude and the latitude, and determine a user frequent location based on a clustering result, and the like. However, the accuracy of obtaining the user's stationary point using the existing method is low.
Disclosure of Invention
In order to solve the above technical problem, embodiments of the present application provide a method, an apparatus, an electronic device, and a computer-readable storage medium for acquiring a resident cell, which can improve the accuracy of acquiring a user resident cell.
Other features and advantages of the present application will be apparent from the following detailed description, or may be learned by practice of the application.
According to an aspect of the embodiments of the present application, there is provided a method for acquiring a resident cell, including: acquiring a plurality of geographical position information of a user in a preset time period, wherein the geographical position information comprises longitude and latitude coordinates and a timestamp; determining a plurality of cells with distributed geographic position information; clustering the plurality of geographic position information by using a density-based clustering algorithm model to obtain a clustering result, wherein neighborhood parameters contained in the density-based clustering model are in direct proportion to the areas of cells distributed by the plurality of geographic position information; and determining resident cells of the users according to the clustering results and the cells distributed with the plurality of geographical position information.
According to an aspect of the embodiments of the present application, there is provided an apparatus for acquiring a resident cell, including: the system comprises an acquisition module, a storage module and a processing module, wherein the acquisition module is used for acquiring a plurality of geographical position information of a user in a preset time period, and the geographical position information comprises longitude and latitude coordinates and a timestamp; the determining module is used for determining a plurality of cells with distributed geographic position information; the clustering module is used for clustering the plurality of geographic position information by using a density-based clustering algorithm model to obtain a clustering result, and neighborhood parameter values contained in the density-based clustering model are in direct proportion to the areas of cells distributed by the plurality of geographic position information; and the result acquisition module is used for determining the resident cell of the user according to the clustering result and the cells distributed by the plurality of geographical position information.
According to an aspect of the embodiments of the present application, there is provided an electronic device, including a processor and a memory, where computer readable instructions are stored on the memory, and when executed by the processor, the method for acquiring a resident cell as above is implemented.
According to an aspect of embodiments of the present application, there is provided a computer-readable storage medium having stored thereon computer-readable instructions, which, when executed by a processor of a computer, cause the computer to execute the method for acquiring a resident cell as provided above.
According to an aspect of embodiments herein, there is provided a computer program product or computer program comprising computer instructions stored in a computer readable storage medium. The processor of the computer device reads the computer instructions from the computer-readable storage medium, and executes the computer instructions, so that the computer device executes the resident cell acquiring method provided in the above-mentioned various optional embodiments.
In the technical scheme provided by the embodiment of the application, the size of the neighborhood parameter based on the density clustering algorithm model is controlled by using the area size of a plurality of cells distributed by the geographical position information of a user, on one hand, the neighborhood parameter of the clustering process can be automatically given, and the clustering automation degree is improved; on the other hand, the neighborhood parameter value of the density-based clustering algorithm model is set to be in direct proportion to the area of the cell, so that the situation that the model judges the normal geographical position information of the user as a noise point due to the fact that the neighborhood parameter value is too small can be avoided, the situation that the noise point is taken as the normal geographical position information by the model to participate in clustering processing due to the fact that the neighborhood parameter value is too large can also be avoided, and therefore large errors are brought, and the accuracy of the obtained resident cell can be improved.
It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the application.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the present application and together with the description, serve to explain the principles of the application. It is obvious that the drawings in the following description are only some embodiments of the application, and that for a person skilled in the art, other drawings can be derived from them without inventive effort. In the drawings:
fig. 1 is a flowchart illustrating a method for acquiring a residential cell according to an exemplary embodiment of the present application;
FIG. 2 is a flowchart of an exemplary embodiment of step S200 in the embodiment shown in FIG. 1;
FIG. 3 is a flowchart of an exemplary embodiment of step S300 in the embodiment shown in FIG. 1;
FIG. 4 is a flowchart of an exemplary embodiment of step S310 in the embodiment shown in FIG. 3;
FIG. 5 is a flowchart of an exemplary embodiment of step S400 in the embodiment shown in FIG. 1;
FIG. 6 is a flowchart of an exemplary embodiment of step S410 in the embodiment shown in FIG. 5;
fig. 7 is a block diagram illustrating an apparatus for acquiring a resident cell according to an exemplary embodiment of the present application.
Detailed Description
Reference will now be made in detail to the exemplary embodiments, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, like numbers in different drawings represent the same or similar elements unless otherwise indicated. The embodiments described in the following exemplary embodiments do not represent all embodiments consistent with the present application. Rather, they are merely examples of apparatus and methods consistent with certain aspects of the present application, as detailed in the appended claims.
The block diagrams shown in the figures are functional entities only and do not necessarily correspond to physically separate entities. I.e. these functional entities may be implemented in the form of software, or in one or more hardware modules or integrated circuits, or in different networks and/or processor means and/or microcontroller means.
The flow charts shown in the drawings are merely illustrative and do not necessarily include all of the contents and operations/steps, nor do they necessarily have to be performed in the order described. For example, some operations/steps may be decomposed, and some operations/steps may be combined or partially combined, so that the actual execution sequence may be changed according to the actual situation.
It should also be noted that: reference to "a plurality" in this application means two or more. "and/or" describe the association relationship of the associated objects, meaning that there may be three relationships, e.g., A and/or B may mean: a exists alone, A and B exist simultaneously, and B exists alone. The character "/" generally indicates that the former and latter associated objects are in an "or" relationship.
The resident address of the user refers to a place where the user frequently stays or lives, or a place where the frequency of the user exceeds a certain threshold value within a period of time, which is called a frequent residence for short. The resident cell of the user refers to a cell to which the resident address of the user belongs.
In real life, there are many application scenarios that require obtaining the resident address of the user, for example, when the user arrives at a certain area, the system automatically pushes nearby food to the user, when the user returns home, the user is automatically helped to turn on an air conditioner, and the like, which often require finding based on the resident address of the user. For an operator, how to mine the information of the frequent location of the user, and even further mine the working place, the residential place or the working place of the user, so that the operator can purposefully develop marketing work in a targeted manner, and can perform fixed-point marketing or network test on the user, which is very beneficial to broadband services, mobile phone network access and the like.
In the prior art, a density-based clustering algorithm is generally used for clustering longitude and latitude information reported by a user, and then a resident cell of the user is obtained through a clustering result. The density-based clustering algorithm model performs clustering according to the density distribution of samples, generally, density clustering examines the connectivity among samples from the perspective of sample density, and continuously expands clustering clusters based on connectable samples to obtain a final clustering result. Compared with other clustering methods, the model can find clusters of various shapes and sizes in noisy data. The most well-known algorithm is the DBSCAN (Density-Based spatial clustering of applied switching Noise, applied spatial clustering Based on Density) model.
The DBSCAN algorithm model has two parameters: the method comprises the following specific steps of neighborhood parameter eps and density threshold MinPts:
1. with each data point xiAs a center, a circle is drawn with eps as a radius, and the circle is called xiThe eps neighborhood of (c).
2. For xiIf x is countediExceeds the density threshold MinPts, x is then determinediThe circle center of the eps neighborhood is marked as a core point, also called a core object; if the number of points in the eps neighborhood of a certain point is smaller than the density threshold MinPts but falls in the eps neighborhood of the core point, the point is called a boundary point; points that are neither core points nor boundary points are noise points.
3. Core point xiAll points in the neighborhood of eps are xiThe direct density of (1) is up to. If xjFrom xiDirect density, xkFrom xjDensity of direct, … …, xnFrom xiDensity through, then xnFrom xiThe density can be reached, and the character shows the transitivity of direct density, and the reach of the density can be deduced.
4. If for xkLet x beiAnd xjAll can be formed by xkThe density can be reached, then, it is called xiAnd xjDensity connected, points connecting densitiesConnected together, a cluster is formed.
Described more generally, if the total number of points in the eps neighborhood of a point is less than the density threshold MinPts, then the point is a low density point; if the density is larger than the density threshold MinPts, the high density point is determined; if one high-density point is in the eps neighborhood of the other high-density point, directly connecting the two high-density points, which is the core point; if a low density point is in the eps neighborhood of the high density point, connecting the low density point to the high density point nearest to the low density point, which is a boundary point; low density dots that are not within the eps neighborhood of any high density dot are outliers.
Advantages of the DBSCAN algorithm model:
1. is not sensitive to noise. This is because the algorithm model can better judge outliers, and even if outliers are misjudged, the final clustering result is not affected.
2. Clusters of arbitrary shape can be found. The DBSCAN algorithm model finds clusters by continuously connecting high-density points in a neighborhood, and only neighborhood parameters and density threshold values need to be defined, so that clusters with different shapes and sizes can be found.
In order to obtain a proper clustering center, a neighborhood parameter value and a density threshold value used for density clustering of longitude and latitude are usually set artificially according to empirical values. Specifically, let MinPts be 2 × dim, dim being the dimension of the user's geographical location information. eps is usually obtained by using a k-distance curve (k-distance graph), the distance between each sample and all samples is calculated, the k-th nearest distance is selected and sorted from large to small to obtain a k-distance curve, and the distance corresponding to the inflection point of the curve is set as eps. If the clustering result of eps and MinPts obtained through the empirical value is not good, the values of eps and MinPts can be properly adjusted, and the most appropriate parameter value is selected through multiple iterative calculation comparisons.
Obviously, the accuracy of the user resident cell obtained according to the clustering result is not high by clustering the geographical position information based on the eps parameter value and the MinPts parameter value obtained by experience.
In order to solve the above problems, embodiments of the present application respectively provide a method, an apparatus, an electronic device and a computer-readable storage medium for acquiring a resident cell, and the embodiments will be described in detail below.
Referring to fig. 1, fig. 1 is a flowchart illustrating a method for acquiring a resident cell according to an exemplary embodiment of the present application, where the method includes the following steps:
step S100: the method comprises the steps of obtaining a plurality of geographic position information of a user in a preset time period, wherein the geographic position information comprises longitude and latitude coordinates and timestamps.
The plurality of geographical location information of the user may be GPS information obtained by positioning the terminal by the user terminal through a Global Positioning System (GPS), but is not limited to this, and may also be positioning information obtained through other positioning manners, for example, geographical location information of the user may be collected based on acquired cell tower data, WiFi data of the user equipment, IP address information of the user equipment, user report information, and other manners.
In the prior art, most internet companies acquire the longitude and latitude of a user of an application program each time the user uses the application program through an SDK (Software development kit) in the application program to acquire the geographical location information of each user.
A timestamp is a complete verifiable data that can indicate that a piece of data already exists at a particular point in time. The time stamp is proposed to provide the user with an electronic proof to prove the generation time of some data of the user. In practical applications, it can be used in all aspects including e-commerce, financial activities, and in particular "non-repudiation" services that can be used to support public key infrastructure. The essence of the time stamp service is to bind the data of the user with the current accurate time, sign the data by using a digital certificate of a time stamp system on the basis, generate a time stamp which can be used for legal evidence by means of the authority authorization status of the time stamp system in law, prove the generation time of the data of the user and achieve the aim of 'non-repudiation' or 'anti-repudiation'. The composition of the time stamping system mainly comprises three parts: a trusted time source, a signature system, and a timestamp database.
In this embodiment, the geographical location information of the user includes a timestamp indicating the exact time when the user was located at the latitude and longitude coordinates corresponding to the geographical location information.
The geographical location information of the user includes longitude information, latitude information, and a timestamp corresponding to the geographical location information of the user, for example, 10 of user a in 1 month and 1 day of 2021: the longitude information of the position of 00 is east longitude 45 degrees, and the latitude information is north latitude 80 degrees.
Step S200: a plurality of cells in which geographical location information is distributed are determined.
For example, if longitude and latitude information of one geographic location information belongs to the range of one cell, the cell is determined to be one of a plurality of cells distributed by the geographic location information of the user. Illustratively, the request map interface determines a cell range to which latitude and longitude information of the user's geographic location information belongs. The Map Interface is a Map Application Programming Interface (Map Application Programming Interface), which is an Application program Interface for embedding a Map into a web page through JavaScript (or other languages). The application program interface provides a number of utilities to process maps and add content to maps through various services, thereby enabling users to create fully functional map applications on websites. Specifically, the map application program interface can present data required by a user on a map, and not only can visualize the data, but also can explore, mine and discover new values of the data, and can be used for connecting all information of the map. The embodiment may determine the cell range to which the longitude and latitude information of the geographical location information of the user belongs by using a high-grade map interface, a google map interface, a Baidu map interface, and the like.
The number of the cells in which the geographic location information is distributed may be one or more according to an actual distribution situation, and it can be understood that, if there is only one cell in which the geographic location information is distributed, the cell is a resident cell of the user, and it is not necessary to cluster the geographic location information.
Step S300: and clustering the plurality of geographic position information by using a density-based clustering algorithm model to obtain a clustering result.
Illustratively, a plurality of geographical location information may be clustered using the DBSCAN algorithm model. Of course, the present embodiment does not limit the specific type of the density-based clustering algorithm model, and as long as the density-based clustering algorithm model contains neighborhood parameters, the present embodiment is applicable to the method for acquiring the resident cell, for example, the optimal (ordering Points to identity) algorithm model.
The embodiment sets the area of a cell in which neighborhood parameter values contained in a density-based clustering model are proportional to a plurality of geographical location information. Obviously, there may be a plurality of cells in which the geographic location information is distributed, and if the areas of the plurality of cells are different in size, there may be a plurality of neighborhood parameter values positively correlated to the area of the cell.
Step S400: and determining resident cells of the users according to the clustering results and the cells distributed with the plurality of geographical position information.
In this embodiment, one resident cell is selected from a plurality of cells distributed with geographic location information as a user by using the clustering results corresponding to all the cells.
Exemplarily, the step S400 may include the steps of:
aiming at each cell, acquiring the number of geographical position information of each cluster of the corresponding clustering result, which belongs to the corresponding cell range;
selecting the cluster with the largest number of geographic position information belonging to the corresponding cell range, and marking as the optimal cluster;
taking the number of the geographical position information of the optimal clustering cluster in the cell range as the score of the corresponding cell;
and selecting the cell with the highest score from the plurality of cells as the resident cell of the user.
In this embodiment, since the score of the cell is in direct proportion to the number of the geographic information positions belonging to the corresponding cell range in the optimal cluster of the corresponding cell, the number of the geographic information belonging to the optimal cluster included in the optimal cluster of the resident cell of the user, which is finally obtained, is the largest.
It should be noted that, the manner of acquiring the user's resident cell in this embodiment is only an example, and the resident cell of the user may also be acquired by other manners, which is not described herein too much.
In the embodiment, the size of the neighborhood parameter value based on the density clustering algorithm model is controlled by using the area size of a plurality of cells distributed by the geographic position information of the user, so that on one hand, the neighborhood parameter value in the clustering process can be automatically given, and the clustering automation degree is improved; on the other hand, the neighborhood parameters of the density-based clustering algorithm model are set to be in direct proportion to the area of the cell, so that the situation that the model judges the normal geographical position information of the user as noise points due to the fact that the neighborhood parameter values are too small can be avoided, and the situation that the noise points participate in clustering as the normal geographical position information by the model due to the fact that the neighborhood parameter values are too large can also be avoided. Therefore, the resident cell acquisition method provided by the application can improve the accuracy of the acquired resident cell.
Referring to fig. 2, fig. 2 is a flowchart of an exemplary embodiment of step S200 in the embodiment shown in fig. 1, and step S200 includes the following steps:
step S210: and acquiring a cell corresponding to each geographic position information to obtain a plurality of candidate cells.
In this embodiment, the cell corresponding to each geographic location information is a cell to which the longitude and latitude coordinates of the geographic location information belong, for example, the map interface is requested to determine the cell to which the longitude and latitude coordinates of the geographic location information belong, and the map interface may be a height map, a Baidu map, or the like, which is not limited herein.
After determining the cell to which the longitude and latitude coordinates of each piece of geographic location information belong, all the obtained cells may be resident cells of the user, and therefore, the embodiment uses the cells to which the longitude and latitude coordinates of the geographic location information of all the users belong as candidate cells.
Step S220: and counting the number of the geographical position information contained in each candidate cell.
Step S230: and if the number of the geographic position information contained in the candidate cell is greater than a preset threshold value, determining that the corresponding candidate cell is a cell with a plurality of geographic position information distributed.
In this embodiment, the preset threshold may be set according to an actual application scenario or an actual requirement, and is not specifically limited herein. For example, the preset threshold is set according to the amount of the geographical location information of the user, for example, if the obtained geographical location information of the user is relatively more, the preset threshold is set to be slightly larger, whereas if the obtained geographical location information of the user is relatively less, the preset threshold is set to be slightly smaller.
In this embodiment, when the geographic location information of the users included in some cells is relatively small, such cells are directly screened out. Illustratively, a user may occasionally pass other cells between resident cells, in which case the amount of geographical location information pertaining to that cell will typically be small, e.g., 1, 2, etc. Since the resident cell is characterized in that the number of the geographical location information of the user belonging to the cell is usually large, when the number of the geographical location information belonging to a cell is small, the embodiment can directly screen out the cell without analyzing the cell. By the method, the workload can be reduced, and the efficiency of acquiring the resident cell of the user can be improved.
Referring to fig. 3, fig. 3 is a flowchart illustrating an exemplary embodiment of step S300 in the embodiment shown in fig. 1, wherein step S300 includes the following steps S310 to S340, which are described in detail as follows:
step S310: the area of a cell in which a plurality of geographical location information are distributed is determined.
In this embodiment, the areas of the cells in which the plurality of geographical location information are distributed may be obtained in various manners, for example, by way of field measurement, or from a history database, and the like, which are not limited herein.
Referring to fig. 4, fig. 4 is a flowchart illustrating an exemplary embodiment of step S310 in the embodiment shown in fig. 3, wherein step S310 includes the following steps:
step S311: a map interface is requested to obtain the outline boundaries of a plurality of cells in which geographical location information is distributed.
The map interface may be a high-grade map interface, a Baidu map interface, or the like, and is not limited herein. The Baidu map interface is a set of application interfaces based on Baidu map service and provided for users by developers, and the Gade map interface is a set of application interfaces based on Gade map service and provided for users by developers.
In the prior art, the map software is usually marked with an outline boundary of each cell, and the outline boundary of each cell is composed of a plurality of outline boundary points, and each outline boundary point comprises corresponding longitude and latitude coordinates. Therefore, the present embodiment can acquire the outline boundaries of a cell in which a plurality of geographical location information of the user are distributed by requesting the map interface.
Step S312: and determining the areas of the cells distributed by the plurality of pieces of geographical position information according to the outline boundaries.
Illustratively, curve fitting is performed on a plurality of longitude and latitude coordinates corresponding to the outline boundary of each cell to obtain an outline boundary curve of the corresponding cell, and then integration operation is performed based on the outline boundary curve of the corresponding cell to obtain the cell area of the corresponding cell.
Curve fitting is a data processing method that approximately describes or mimics a continuous curve the functional relationship between coordinates represented by discrete groups of points on a plane. In scientific experiments or social activities, a set of data pairs (x) of quantities x and y is obtained by experiments or observationsi,yi) Wherein i is 1, 2, … …, m, each xiAre different from each other, the dependency between the quantities x and y is reflected in a class of analytical expressions adapted to the background material law of the data, y ═ f (x, c), i.e. in a sense to approximate or fit the known data "best". f (x, c) is often called the fitting model, where c is some undetermined parameter, and when c occurs linearly in f, it is called the linear modeType, otherwise called nonlinear model. There are many standards for goodness-of-fit, and the most common one is to choose the parameter c such that the residual (or dispersion) e of the fit model from the actual observed values at each point isk=yk-f(xkAnd c) the weighted sum of squares is minimized, in which case the curve is called a fitted curve to the data in the weighted least squares sense. There are many successful methods of solving fitted curves, for which linear models the fitted curve is generally obtained by establishing and solving a system of equations to determine the parameters. For non-linear models, fitting curves are obtained by solving a non-linear system of equations or by using optimization methods to obtain the required parameters, sometimes referred to as non-linear least squares fitting.
Step S320: and acquiring the ratio of the area of the cells distributed by the plurality of pieces of geographical position information to the total number of the pieces of geographical position information.
The present embodiment obtains a ratio between the area of each cell and the total number of geographical location information.
Step S330: and calculating the corresponding circle radius by taking the ratio as the circle area, and taking the circle radius as a neighborhood parameter value contained in the density-based clustering algorithm model.
If the circle area is not S, and the circle radius corresponding to the circle area is R, the formula (1):
Figure BDA0003329994160000101
the neighborhood parameter values of the density-based clustering algorithm model can be accurately determined in the above manner, and the neighborhood parameter values determined in this embodiment are in direct proportion to the cells in which the plurality of geographical location information of the user is distributed, so that the situation that the model determines the normal geographical location information of the user as noise points due to too small neighborhood parameter values can be avoided, and the situation that the noise points are taken as the normal geographical location information by the model to participate in clustering processing and bring larger errors can be avoided, so that the accuracy of the obtained residential cell can be improved by the residential cell obtaining method provided by the present application.
Step S340: and clustering the plurality of geographic position information by using a clustering algorithm model containing neighborhood parameters to obtain a clustering result.
In this embodiment, each cell corresponds to one neighborhood parameter value, all neighborhood parameter values are traversed, and a density-based clustering algorithm model is used to cluster a plurality of geographic location information to obtain a clustering result corresponding to each cell.
Optionally, referring to fig. 5, fig. 5 is a flowchart of an exemplary embodiment of step S400 in the embodiment shown in fig. 1, in this embodiment, a plurality of cells in which geographic location information is distributed include a plurality of cells, and a clustering result includes at least one geographic location information cluster corresponding to each cell.
As shown in fig. 5, step S400 includes the steps of:
step S410: and acquiring a first center of each cell and a second center of each geographical location information cluster of the corresponding cell.
In this embodiment, the first center of the cell is a geometric center or centroid of the corresponding cell, and the second center of the geographical location information cluster is a geometric center or centroid of the corresponding geographical location information cluster. The geometric center or centroid of an object X in n-dimensional space is the intersection of all hyperplanes that divide X into two equal-moment parts. Informally, the center is the average of all points in X. A finite number of points always exist at the geometric center, which can be obtained by calculating the arithmetic mean of each coordinate component of these points.
Optionally, referring to fig. 6, fig. 6 is a flowchart of an exemplary embodiment of step S410 in the embodiment shown in fig. 5, and as shown in fig. 6, the step S410 of acquiring the first center of each cell includes the following steps:
step S411: the contour boundary of each cell is obtained.
Similarly, in this embodiment, the contour boundary of each cell may be obtained based on a map interface, and in this embodiment, the contour boundary of each cell includes a plurality of boundary point coordinates, and the boundary point coordinates are longitude and latitude coordinates of the boundary points.
Step S412: and determining a first center of the corresponding cell according to the acquired outline boundary.
In this embodiment, an arithmetic mean of longitude coordinates of all boundary points of each cell is calculated to obtain a longitude coordinate of the first center; and calculating the arithmetic mean of the latitude coordinates of all boundary points of each cell to obtain the latitude coordinate of the first center, and further obtain the first center of the corresponding cell.
After clustering processing is carried out on a plurality of pieces of geographical position information of a user by using a density-based clustering algorithm model, at least one geographical position information cluster of a corresponding cell is obtained, and each geographical position information cluster comprises part or all of the plurality of pieces of geographical position information. In this embodiment, an arithmetic mean of longitude coordinates of all pieces of geographic location information included in each geographic location information cluster is calculated and used as a longitude coordinate of the second center of the geographic location information cluster, and an arithmetic mean of latitude coordinates of all pieces of geographic location information included in each geographic location information cluster is calculated and used as a latitude coordinate of the second center of the geographic location information cluster, so that the second center of each geographic location information cluster is obtained.
Step S420: and calculating Euclidean distances between the first centers and each second center of the corresponding cell respectively, and determining the minimum Euclidean distance.
The calculation formula of the Euclidean distance is formula (2):
Figure BDA0003329994160000121
i=1,2,……,n;j=1,2,……,m;
wherein n represents the total number of cells in which a plurality of geographical location information are distributed, m represents the total number of geographical location information clusters of the corresponding cell, and LijRepresenting Euclidean distance, X, between a first center of a cell i and a second center of a jth geographic position information cluster corresponding to the cell ii1Is the longitude coordinate of the first center of cell i, Yi1Is the latitude coordinate, X, of the first center of cell iijLongitude coordinate, Y, representing the second center of the jth geographic location information cluster corresponding to cell iijAnd a latitude coordinate representing a second center of a jth geographic position information cluster corresponding to the cell i.
Step S430: and scoring the corresponding cell according to the minimum Euclidean distance to obtain a corresponding score value, wherein the score is used for representing the possibility that the resident cell of the user is the corresponding cell.
In this embodiment, the score value of the corresponding cell is inversely proportional to the corresponding minimum euclidean distance, that is, the larger the minimum euclidean distance corresponding to each cell is, the lower the corresponding score value is, otherwise, the smaller the minimum euclidean distance corresponding to the cell is, the higher the corresponding score value is.
Illustratively, the score value of the corresponding cell is obtained according to formula (3):
Ci=k/Limin (3)
wherein, CiA fraction value representing cell i, k being a scaling parameter, LiminIndicating the minimum euclidean distance for cell i.
Step S440: and selecting the cell with the highest score value as a resident cell of the user.
In summary, in the embodiment, the size of the neighborhood parameter value based on the density clustering algorithm model is controlled by using the area size of the cell where the plurality of geographic location information of the user are distributed, so that on one hand, the neighborhood parameter value in the clustering process can be automatically given, and the clustering automation degree is improved; on the other hand, the neighborhood parameter value of the density-based clustering algorithm model is set to be in direct proportion to the area of the cell, so that the situation that the model judges the normal geographical position information of the user as a noise point due to the fact that the neighborhood parameter value is too small can be avoided, and the situation that the noise point is taken as the normal geographical position information by the model to participate in clustering processing and bring large errors can be avoided, and therefore the method for acquiring the resident cell can improve the accuracy of the acquired resident cell.
The above-mentioned resident cell acquisition method may be executed by a computer device (or a text processing device). Computer devices herein may include, but are not limited to: terminal devices such as smart phones, tablets, laptops, desktops, etc.: or a service device such as a data processing server, a Web server, an application server, etc., where the server may be an independent physical server, or may be a server cluster or a distributed system formed by a plurality of physical servers, or may be a cloud server providing basic cloud computing services such as a cloud service, a cloud database, cloud computing, a cloud function, cloud storage, a network service, cloud communication, a middleware service, a domain name service, a security service, a CDN, and a big data and artificial intelligence platform, where the server may be a node server on a block chain. The terminal may be, but is not limited to, a smart phone, a tablet computer, a laptop computer, a desktop computer, a smart speaker, a smart watch, and the like. The terminal device and the service device may be directly or indirectly connected through wired or wireless communication, and the application is not limited herein.
Referring to fig. 7, fig. 7 is a block diagram of an apparatus for acquiring a residential area according to an exemplary embodiment of the present application, and as shown in fig. 7, the apparatus 1000 for acquiring a residential area includes an acquiring module 1100, a first determining module 1200, a clustering module 1300, and a second determining module 1400.
The acquisition module 1100 is configured to acquire a plurality of geographic location information of a user within a preset time period, where the geographic location information includes longitude and latitude coordinates and a timestamp; the first determining module 1200 is configured to determine a plurality of cells in which geographic location information is distributed; the clustering module 1300 is configured to perform clustering processing on the multiple geographic location information by using a density-based clustering algorithm model to obtain a clustering result, wherein neighborhood parameter values contained in the density-based clustering model are in direct proportion to areas of cells in which the multiple geographic location information are distributed; the second determining module 1400 is configured to determine a resident cell of the user according to the clustering result and the cells distributed with the multiple geographic location information.
In another exemplary embodiment, the first determining module 1200 includes a first obtaining unit, a counting unit, and a first determining unit, where the obtaining unit is configured to obtain a cell corresponding to each geographic location information, and obtain a plurality of candidate cells; the statistical unit is used for counting the number of the geographical position information contained in each candidate cell; the determining unit is configured to determine that the corresponding candidate cell is a cell with a plurality of geographic location information distributed therein if the number of geographic location information included in the candidate cell is greater than a preset threshold.
In another exemplary embodiment, the clustering module 1300 includes a second determining unit, a second obtaining unit, a first calculating unit, and a clustering unit, wherein the second determining unit is configured to determine areas of a plurality of cells in which geographical location information is distributed; the second acquisition unit is used for acquiring the ratio of the area of a plurality of cells distributed with the geographic position information to the total number of the geographic position information; the first calculating unit is used for calculating the corresponding circle radius by taking the ratio as the circle area, and taking the circle radius as a neighborhood parameter value contained in the density-based clustering algorithm model; the clustering unit is used for clustering the plurality of geographic position information by using a clustering algorithm model containing neighborhood parameter values to obtain a clustering result.
In another exemplary embodiment, the second determining unit includes a requesting subunit and a second determining subunit, wherein the requesting subunit is configured to request a map interface to obtain contour boundaries of a plurality of cells in which geographic location information is distributed; the second determining subunit is used for determining the areas of the cells distributed by the plurality of geographic position information according to the outline boundary.
In another exemplary embodiment, the second determination module 1400 includes a third obtaining unit, a second calculating unit, a scoring unit, and a selecting unit.
The third acquisition unit is used for acquiring a first center of each cell and acquiring a second center of each geographical position information cluster of the corresponding cell; the second calculating unit is used for calculating Euclidean distances between the first center and each second center of the corresponding cell respectively and determining the minimum Euclidean distance; the scoring unit is used for scoring the corresponding cell according to the minimum Euclidean distance to obtain a corresponding score value, and the score is used for representing the possibility that the resident cell of the user is the corresponding cell; the selection unit is used for selecting the cell with the highest score value as the resident cell of the user.
In another exemplary embodiment, the third obtaining unit includes a third obtaining subunit and a third determining subunit, where the third obtaining subunit is configured to obtain the contour boundary of each cell; and the third determining subunit is used for determining the first center of the corresponding cell according to the acquired outline boundary.
It should be noted that the apparatus provided in the foregoing embodiment and the method provided in the foregoing embodiment belong to the same concept, and the specific manner in which each module and unit execute operations has been described in detail in the method embodiment, and is not described again here.
In another exemplary embodiment, the present application provides an electronic device comprising a processor and a memory, wherein the memory has stored thereon computer readable instructions which, when executed by the processor, implement the above resident cell acquisition method.
Another aspect of the present application also provides a computer readable storage medium having computer readable instructions stored thereon, which when executed by a processor implement the resident cell acquisition method as in the previous embodiment.
Another aspect of the application also provides a computer program product or computer program comprising computer instructions stored in a computer readable storage medium. The processor of the computer device reads the computer instructions from the computer-readable storage medium, and executes the computer instructions, so that the computer device executes the method for acquiring the resident cell provided in the above embodiments.
It should be noted that the computer readable medium shown in the embodiments of the present application may be a computer readable signal medium or a computer readable storage medium or any combination of the two. The computer readable storage medium may be, for example, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples of the computer readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a Read-Only Memory (ROM), an Erasable Programmable Read-Only Memory (EPROM), a flash Memory, an optical fiber, a portable Compact Disc Read-Only Memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the present application, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In this application, however, a computer readable signal medium may include a propagated data signal with a computer program embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. The computer program embodied on the computer readable medium may be transmitted using any appropriate medium, including but not limited to: wireless, wired, etc., or any suitable combination of the foregoing.
The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present application. Each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams or flowchart illustration, and combinations of blocks in the block diagrams or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
The units described in the embodiments of the present application may be implemented by software, or may be implemented by hardware, and the described units may also be disposed in a processor. Wherein the names of the elements do not in some way constitute a limitation on the elements themselves.
The above description is only a preferred exemplary embodiment of the present application, and is not intended to limit the embodiments of the present application, and those skilled in the art can easily make various changes and modifications according to the main concept and spirit of the present application, so that the protection scope of the present application shall be subject to the protection scope of the claims.

Claims (10)

1. A method for acquiring a resident cell, comprising:
acquiring a plurality of geographical position information of a user in a preset time period, wherein the geographical position information comprises longitude and latitude coordinates and timestamps;
determining a cell in which the plurality of geographical location information are distributed;
clustering the plurality of geographic position information by using a density-based clustering algorithm model to obtain a clustering result, wherein neighborhood parameter values contained in the density-based clustering algorithm model are in direct proportion to the areas of cells distributed by the plurality of geographic position information;
and determining resident cells of the users according to the clustering results and the cells distributed by the plurality of pieces of geographical position information.
2. The method of claim 1, wherein the determining the plurality of cells in which the geographic location information is distributed comprises:
obtaining a cell corresponding to each geographic position information to obtain a plurality of candidate cells;
counting the number of the geographical position information contained in each candidate cell;
and if the number of the geographic position information contained in the candidate cell is greater than a preset threshold value, determining that the corresponding candidate cell is the cell distributed by the geographic position information.
3. The method of claim 1, wherein the clustering the plurality of geographical location information using a density-based clustering algorithm model to obtain a clustering result comprises:
determining an area of a cell in which the plurality of geographical location information are distributed;
obtaining the ratio of the area of the cells distributed by the plurality of geographic position information to the total number of the geographic position information;
calculating a corresponding circle radius by taking the ratio as a circle area, and taking the circle radius as a neighborhood parameter value contained in the density-based clustering algorithm model;
and clustering the plurality of geographic position information by using a clustering algorithm model containing the neighborhood parameters to obtain a clustering result.
4. The method of claim 3, wherein the determining the area of the cell in which the plurality of geographic location information are distributed comprises:
requesting a map interface to obtain outline boundaries of cells in which the plurality of geographical location information are distributed;
and determining the areas of the cells distributed by the plurality of geographic position information according to the outline boundary.
5. The method of claim 1, wherein the plurality of cells in which the geographical location information is distributed comprise a plurality of cells, and wherein the clustering result comprises at least one geographical location information cluster corresponding to each cell;
the determining the resident cell of the user according to the clustering result and the cells distributed with the plurality of geographical location information comprises:
acquiring a first center of each cell and a second center of each geographical position information cluster of the corresponding cell;
calculating Euclidean distances between the first center and each second center of the corresponding cell respectively, and determining a minimum Euclidean distance;
scoring the corresponding cell according to the minimum Euclidean distance to obtain a corresponding score value, wherein the score is used for representing the possibility that the resident cell of the user is the corresponding cell;
and selecting the cell with the highest score value as a resident cell of the user.
6. The method of claim 5, wherein obtaining the first center of each cell comprises:
acquiring the outline boundary of each cell;
and determining a first center of the corresponding cell according to the acquired outline boundary.
7. The method of claim 5, wherein the fractional value of the corresponding cell is inversely proportional to the corresponding minimum Euclidean distance.
8. An apparatus for acquiring a resident cell, comprising:
the system comprises an acquisition module, a processing module and a display module, wherein the acquisition module is used for acquiring a plurality of geographical position information of a user in a preset time period, and the geographical position information comprises longitude and latitude coordinates and timestamps;
a determining module, configured to determine cells in which the plurality of geographic location information are distributed;
the clustering module is used for clustering the plurality of geographic position information by using a density-based clustering algorithm model to obtain a clustering result, and neighborhood parameters contained in the density-based clustering model are in direct proportion to the areas of cells distributed by the plurality of geographic position information;
and the result acquisition module is used for determining the resident cell of the user according to the clustering result and the cells distributed by the plurality of pieces of geographic position information.
9. An electronic device, comprising:
a memory storing computer readable instructions;
a processor to read computer readable instructions stored by the memory to perform the method of any of claims 1-7.
10. A computer-readable storage medium having computer-readable instructions stored thereon, which, when executed by a processor of a computer, cause the computer to perform the method of any one of claims 1-7.
CN202111279958.5A 2021-10-29 2021-10-29 Resident cell acquisition method and device, electronic equipment and storage medium Pending CN113961780A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111279958.5A CN113961780A (en) 2021-10-29 2021-10-29 Resident cell acquisition method and device, electronic equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111279958.5A CN113961780A (en) 2021-10-29 2021-10-29 Resident cell acquisition method and device, electronic equipment and storage medium

Publications (1)

Publication Number Publication Date
CN113961780A true CN113961780A (en) 2022-01-21

Family

ID=79468665

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111279958.5A Pending CN113961780A (en) 2021-10-29 2021-10-29 Resident cell acquisition method and device, electronic equipment and storage medium

Country Status (1)

Country Link
CN (1) CN113961780A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115526221A (en) * 2022-04-19 2022-12-27 荣耀终端有限公司 Positioning abnormity detection and processing method and related equipment
CN115550843A (en) * 2022-04-19 2022-12-30 荣耀终端有限公司 Positioning method and related equipment

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115526221A (en) * 2022-04-19 2022-12-27 荣耀终端有限公司 Positioning abnormity detection and processing method and related equipment
CN115550843A (en) * 2022-04-19 2022-12-30 荣耀终端有限公司 Positioning method and related equipment
CN115550843B (en) * 2022-04-19 2023-10-20 荣耀终端有限公司 Positioning method and related equipment
CN115526221B (en) * 2022-04-19 2023-10-24 荣耀终端有限公司 Positioning abnormality detection and processing method and related equipment

Similar Documents

Publication Publication Date Title
CN109995884B (en) Method and apparatus for determining precise geographic location
CN108446281B (en) Method, device and storage medium for determining user intimacy
CN109992633B (en) User position-based geo-fence determination method and device and electronic equipment
US10375171B2 (en) Iterative learning for reliable sensor sourcing systems
CN112218330B (en) Positioning method and communication device
KR20190139130A (en) Analysis method of fluidized population information capable of providing real-time fluidized population data by pcell algorithm
Larmarange et al. HIV estimates at second subnational level from national population-based surveys
CN113961780A (en) Resident cell acquisition method and device, electronic equipment and storage medium
KR101721114B1 (en) Method for Determining the Size of Grid for Clustering on Multi-Scale Web Map Services using Location-Based Point Data
CN111935820B (en) Positioning implementation method based on wireless network and related equipment
CN109936820B (en) User terminal positioning method and device
CN110888866A (en) Data expansion method and device, data processing equipment and storage medium
CN112231592A (en) Network community discovery method, device, equipment and storage medium based on graph
CN112052848B (en) Method and device for acquiring sample data in street labeling
CN111427983A (en) Service method, system, device and storage medium based on geographic information retrieval
CN112861972A (en) Site selection method and device for exhibition area, computer equipment and medium
CN110830604A (en) DNS scheduling method and device
CN110990639B (en) Data processing method and device for education informatization horizontal trend analysis
CN111311193B (en) Method and device for configuring public service resources
Langley et al. Using meta-quality to assess the utility of volunteered geographic information for science
Miao et al. Quality-aware online task assignment in mobile crowdsourcing
CN110298687B (en) Regional attraction assessment method and device
CN111597279A (en) Information prediction method based on deep learning and related equipment
Leyk et al. Establishing relationships between parcel data and land cover for demographic small area estimation
CN117079148B (en) Urban functional area identification method, device, equipment and medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination