CN110888866B - Data expansion method and device, data processing equipment and storage medium - Google Patents

Data expansion method and device, data processing equipment and storage medium Download PDF

Info

Publication number
CN110888866B
CN110888866B CN201911236586.0A CN201911236586A CN110888866B CN 110888866 B CN110888866 B CN 110888866B CN 201911236586 A CN201911236586 A CN 201911236586A CN 110888866 B CN110888866 B CN 110888866B
Authority
CN
China
Prior art keywords
coordinate information
information
coordinate
target
cluster
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201911236586.0A
Other languages
Chinese (zh)
Other versions
CN110888866A (en
Inventor
潘峰
海梓晗
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Miaozhen Information Technology Co Ltd
Original Assignee
Miaozhen Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Miaozhen Information Technology Co Ltd filed Critical Miaozhen Information Technology Co Ltd
Priority to CN201911236586.0A priority Critical patent/CN110888866B/en
Publication of CN110888866A publication Critical patent/CN110888866A/en
Application granted granted Critical
Publication of CN110888866B publication Critical patent/CN110888866B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/21Design, administration or maintenance of databases
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/29Geographical information databases

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Remote Sensing (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The embodiment of the application provides a data expansion method, a data expansion device, data processing equipment and a storage medium. Acquiring position information under the same IP address when a target person uses electronic equipment to generate a preset behavior, wherein the position information comprises first position information carrying coordinate information and second position information not carrying the coordinate information, and the coordinate information comprises a central point coordinate and an error radius based on the central point coordinate; clustering coordinate information in the first position information through a DBSCAN algorithm to obtain at least one candidate cluster according to the first position information; selecting a target cluster from the candidate clusters, and calculating to obtain target coordinate information according to coordinate information contained in the target cluster; and taking the target coordinate information as the coordinate information in the second position information. The clustering result of the DBSCAN algorithm is more reasonable, so that the accuracy of the calculated coordinate information in the second position information is improved.

Description

Data expansion method, device, data processing equipment and storage medium
Technical Field
The present application relates to the field of data processing, and in particular, to a data expansion method, apparatus, data processing device, and storage medium.
Background
LBS (Location Based Services) is a service that uses various types of Location technologies to obtain the current Location of a positioning device and provides information resources and basic Services to the positioning device through the mobile internet. The LBS first determines its own spatial location using a positioning technology, and then can acquire location-related resources and information through the mobile internet. The LBS service integrates various information technologies such as mobile communication, internet, space positioning, big data and the like, and a mobile internet service platform is utilized to update and interact data, so that a user can obtain corresponding services through space positioning.
When analyzing location information when a user generates a preset action using an electronic device (e.g., location information when generating this advertising action), there is a problem in that the amount of location information carrying LBS is very small. Therefore, there is a need for an extension of location information that does not carry LBS. How to improve the accuracy of the expansion is a very tricky and difficult problem.
Disclosure of Invention
In order to overcome at least one of the deficiencies in the prior art, an object of the embodiments of the present application is to provide a data expansion method applied to a data processing device, the method including:
acquiring position information under the same IP address when a target person uses electronic equipment to generate a preset behavior, wherein the position information comprises first position information carrying coordinate information and second position information not carrying coordinate information, and the coordinate information comprises a central point coordinate and an error radius based on the central point coordinate;
for the first position information, clustering coordinate information in the first position information through a Density-Based Clustering of Applications with Noise (DBSCAN) algorithm to obtain at least one candidate cluster;
selecting a target cluster from the candidate clusters, and calculating to obtain target coordinate information according to coordinate information contained in the target cluster;
and taking the target coordinate information as coordinate information in the second position information.
Optionally, the step of selecting a target cluster from the candidate clusters includes:
and selecting the candidate cluster containing the most coordinate information from the candidate clusters as the target cluster.
Optionally, the step of selecting a candidate cluster containing the most coordinate information from the candidate clusters as the target cluster includes:
selecting the candidate cluster containing the most coordinate information from the candidate clusters, and acquiring the number of the coordinate information in the candidate cluster;
comparing the number of the coordinate information with a preset target threshold;
and if the number of the coordinate information is larger than the preset target threshold, taking the candidate cluster containing the most coordinate information as the target cluster.
Optionally, the step of obtaining the target coordinate information by calculating according to the coordinate information included in the target cluster includes:
calculating the average coordinate of the center point coordinate of each coordinate information in the target cluster according to the coordinate information in the target cluster, and taking the average coordinate as the center point coordinate of the target cluster;
and calculating the average error radius of the error radius of each coordinate information of the target cluster, and taking the average error radius as the error radius of the center point coordinate of the target cluster.
Optionally, the clustering, with respect to the first location information, the coordinate information in the first location information by using a density clustering algorithm DBSCAN algorithm to obtain at least one candidate cluster includes:
searching a core point in the first position information according to a preset radius and a preset number threshold value aiming at each coordinate information in the first position information, wherein the core point is a central point coordinate based on the coordinate information, the sum of the number of other central point coordinates falling in the preset radius is larger than the preset number threshold value, and each core point 501 corresponds to a cluster with the preset radius;
and according to a preset distance threshold, combining the clusters with the edge closest distance not greater than the preset distance threshold to obtain at least one candidate cluster.
Optionally, the center point coordinate is a longitude and latitude coordinate.
A second objective of the embodiments of the present application is to provide a data expansion device, which is applied to an electronic device, and the data expansion device includes a position acquisition module, a cluster analysis module, a position calculation module, and a data expansion module;
the position acquisition module is used for acquiring position information under the same IP address when a target person uses the electronic equipment to generate a preset behavior, the position information comprises first position information carrying coordinate information and second position information not carrying coordinate information, and the coordinate information comprises a central point coordinate and an error radius based on the central point coordinate;
the cluster analysis module is used for clustering the coordinate information in the first position information through a density clustering algorithm DBSCAN algorithm to obtain at least one candidate cluster;
the position calculation module is used for selecting a target cluster from the candidate clusters and calculating to obtain target coordinate information according to coordinate information contained in the target cluster;
the data expansion module is used for taking the target coordinate information as coordinate information in the second position information.
Optionally, the position calculation module selects a target cluster in the following manner;
and selecting the candidate cluster containing the most coordinate information from the candidate clusters as the target cluster.
It is a third objective of the embodiments of the present application to provide a data processing apparatus, where the data processing apparatus includes a processor and a memory, where the memory stores machine-executable instructions, and the processor executes the machine-executable instructions to implement the data expansion method.
It is a fourth object of the embodiments of the present application to provide a storage medium storing a computer program that realizes the data expansion method when the computer program is executed.
Compared with the prior art, the method has the following beneficial effects:
the embodiment of the application provides a data expansion method, a data expansion device, data processing equipment and a storage medium. Based on the existing coordinate information of the target personnel, clustering the existing coordinate information of the target personnel through a DBSCAN algorithm to obtain at least one candidate cluster. And selecting a target cluster from the candidate clusters, and calculating the coordinate information in the second position information based on the coordinate information in the target cluster. The clustering result of the DBSCAN algorithm is more reasonable, so that the accuracy of the calculated coordinate information in the second position information is improved.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings that are required to be used in the embodiments will be briefly described below, it should be understood that the following drawings only illustrate some embodiments of the present application and therefore should not be considered as limiting the scope, and for those skilled in the art, other related drawings can be obtained from the drawings without inventive effort.
Fig. 1 is a hardware configuration diagram of a data processing device according to an embodiment of the present application;
FIG. 2 is a flowchart illustrating steps of a data expansion method according to an embodiment of the present application;
fig. 3 is a schematic diagram of a clustering scene provided in an embodiment of the present application;
FIG. 4 is a schematic diagram of an edge closest distance provided by an embodiment of the present application;
fig. 5 is a schematic structural diagram of a data expansion device according to an embodiment of the present application.
Icon: 100-a data processing device; 110-a data expansion device; 120-a memory; 130-a processor; 501-core points; 502-edge points; 503-noise points; 600-preset radius; 701-candidate cluster M; 702-candidate cluster N; 1101-a position acquisition module; 1102-cluster analysis module; 1103-a position calculation module; 1104-data expansion module.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present application clearer, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are some embodiments of the present application, but not all embodiments. The components of the embodiments of the present application, generally described and illustrated in the figures herein, can be arranged and designed in a wide variety of different configurations.
Thus, the following detailed description of the embodiments of the present application, presented in the accompanying drawings, is not intended to limit the scope of the claimed application, but is merely representative of selected embodiments of the application. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.
It should be noted that: like reference numbers and letters refer to like items in the following figures, and thus, once an item is defined in one figure, it need not be further defined and explained in subsequent figures.
In the description of the present application, it is noted that the terms "first", "second", "third", and the like are used merely for distinguishing between descriptions and are not intended to indicate or imply relative importance.
As described in the background art, when analyzing location information when a user generates a preset action using an electronic device (e.g., location information when generating this advertising action), there is a problem in that the amount of location information carrying LBS is very small. Therefore, there is a need for an extension of location information that does not carry LBS. How to improve the accuracy of the expansion is a very tricky and difficult problem.
In view of this, the present application provides a data expansion method applied to a data processing device. Referring to fig. 1, a hardware structure diagram of the data processing apparatus 100 is shown. The data processing apparatus 100 includes a data expansion device 110, a memory 120, and a processor 130.
The elements of the memory 120 and the processor 130 are electrically connected to each other directly or indirectly to realize data transmission or interaction. For example, the components may be electrically connected to each other via one or more communication buses or signal lines. The data expansion device 110 includes at least one software functional module which can be stored in the memory 120 in the form of software or Firmware (Firmware) or solidified in an Operating System (OS) of the data processing apparatus 100. The processor 130 is used for executing executable modules stored in the memory 120, such as software functional modules and computer programs included in the data expansion device 110.
The Memory 120 may be, but is not limited to, a Random Access Memory (RAM), a Read Only Memory (ROM), a Programmable Read-Only Memory (PROM), an Erasable Read-Only Memory (EPROM), an electrically Erasable Read-Only Memory (EEPROM), and the like. The memory 120 is used for storing a program, and the processor 130 executes the program after receiving the execution instruction.
The processor 130 may be an integrated circuit chip having signal processing capabilities. The Processor 130 may be a general-purpose Processor, and includes a Central Processing Unit (CPU), a Network Processor (NP), and the like; but may also be a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA) or other programmable logic device, discrete gate or transistor logic, discrete hardware components. The various methods, steps, and logic blocks disclosed in the embodiments of the present application may be implemented or performed. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.
The data processing Device 100 may be, but is not limited to, a smart phone, a Personal Computer (PC), a tablet PC, a Personal Digital Assistant (PDA), a Mobile Internet Device (MID), and the like.
Referring to fig. 2, a flowchart of a data expansion method applied to the data processing apparatus 100 shown in fig. 1 is shown, and the method including various steps will be described in detail below.
Step S100, acquiring position information under the same IP address when a target person uses the electronic equipment to generate a preset behavior, wherein the position information comprises first position information carrying coordinate information and second position information not carrying coordinate information, and the coordinate information comprises a central point coordinate and an error radius based on the central point coordinate.
For example, in one possible example, when a user browses a video or a webpage through a mobile phone APP, after browsing an interested advertisement, the user clicks the advertisement to further obtain specific advertisement content. When the user clicks the advertisement to further obtain specific advertisement content, the server obtains the current IP address of the mobile phone and the current coordinate information of the mobile phone, namely LBS information. However, the current coordinate information of the mobile phone is not obtained every time the user clicks the advertisement to further obtain the specific advertisement content, which is limited by the hardware device specifically providing the network service for the user.
Step S200, aiming at the first position information, clustering coordinate information in the first position information through a density clustering algorithm DBSCAN algorithm to obtain at least one candidate cluster.
Step S300, selecting a target cluster from the candidate clusters, and calculating to obtain target coordinate information according to coordinate information contained in the target cluster.
Step S400, the target coordinate information is used as the coordinate information in the second position information.
In this way, the data processing apparatus 100 clusters the existing coordinate information of the target person by the DBSCAN algorithm based on the existing coordinate information of the target person to obtain at least one candidate cluster. And selecting a target cluster from the candidate clusters, and calculating the coordinate information in the second position information based on the coordinate information in the target cluster. The clustering result of the DBSCAN algorithm is more reasonable, so that the accuracy of the calculated coordinate information in the second position information is improved.
When the target cluster is selected, the candidate cluster containing the most coordinate information is selected from the candidate clusters to serve as the target cluster.
As one possible embodiment, the data processing apparatus 100 selects a candidate cluster containing the most coordinate information from the candidate clusters, and acquires the number of coordinate information therein; comparing the quantity of the coordinate information with a preset target threshold; and if the number of the coordinate information is larger than the preset target threshold, taking the candidate cluster containing the most coordinate information as the target cluster.
For example, in one possible example, if 25 pieces of coordinate information are included in the first location information, the data processing apparatus 100 obtains 3 candidate clusters after clustering the coordinate information in the first location information by the DBSCAN algorithm. The 3 candidate clusters are respectively a candidate cluster a including 17 pieces of coordinate information, a candidate cluster B including 3 pieces of coordinate information, and a candidate cluster C including 5 pieces of coordinate information. If the preset target threshold is half of the number of coordinate information in the first position information, i.e., 13, the data processing apparatus 100 regards the candidate cluster a as the target cluster.
As another embodiment, if the number of the coordinate information is not greater than the preset target threshold, it indicates that the distribution of the coordinate information in the first position information is relatively dispersed, and then one of the clusters cannot be selected as the target cluster.
The position information also carries time information of the target person when the target person uses the electronic equipment to generate the preset behavior. For second position information not carrying coordinate information, the data processing device 100 sequentially selects position information to be expanded from the second position information not carrying coordinate information; and acquiring time information in the position information to be expanded, comparing the time information with time information corresponding to each coordinate information in the first position information, and taking the coordinate information with the closest time as reference coordinate information. Further, the data processing apparatus 100 takes the candidate cluster to which the reference coordinate information belongs as a target cluster.
It should be appreciated that, in general, a user has a relatively fixed range of activities during a certain period of time (e.g., a month or a year). Therefore, the location of the user does not change much at adjacent times or dates (e.g., 12 o 'clock for the same date, the location information of the user at the A location does not carry coordinate information, and 13 o' clock, the location information of the user at the B location carries coordinate information.
For example, in another possible example, the time of the position information to be augmented in the second position information is "2018-09-03 09. After clustering the coordinate information in the first position information by the DBSCAN algorithm, the data processing device 100 obtains 3 candidate clusters, which are respectively a candidate cluster a including 8 coordinate information, a candidate cluster B including 10 coordinate information, and a candidate cluster C including 7 coordinate information. Wherein, if the reference coordinate information closest to the time "2018-09-03 09-34" belongs to the candidate cluster a, the candidate cluster a is regarded as the target cluster.
Optionally, target coordinate information is obtained for calculation according to coordinate information contained in the target cluster. As a possible implementation manner, the data processing apparatus 100 calculates an average coordinate of the center point coordinate of each coordinate information in the target cluster according to the coordinate information in the target cluster, and takes the average coordinate as the center point coordinate of the target cluster; and calculating the average error radius of the error radius of each coordinate information of the target cluster, and taking the average error radius as the error radius of the center point coordinate of the target cluster.
For example, the target cluster includes 5 pieces of coordinate information, namely, a coordinate a (3,4) with an error radius of 10m, a coordinate B (5,2) with an error radius of 5m, a coordinate C (6,7) with an error radius of 7m, a coordinate D (5,9) with an error radius of 8m, and a coordinate E (6,2) with an error radius of 4 m. The average coordinate X (X1, X2) of the center point coordinate can be expressed as:
Figure BDA0002305045000000091
Figure BDA0002305045000000092
accordingly, the average error radius R can be expressed as:
Figure BDA0002305045000000093
it should be understood that the calculation method of the target coordinate information is not limited to the above example, and may be other methods, and the embodiment of the present application is not particularly limited.
Optionally, the coordinate information in the first location information is clustered by the DBSCAN algorithm to obtain at least one candidate cluster. The data processing apparatus 100 searches for a core point in the first location information according to a preset radius and a preset number threshold for each piece of coordinate information in the first location information.
The core point is a central point coordinate based on coordinate information, the sum of the number of other central point coordinates falling within the preset radius is larger than the preset number threshold, and each core point corresponds to one cluster with the preset radius.
The data processing device 100 based on a preset distance threshold, and merging the clusters with the edge closest distance not greater than the preset distance threshold value to obtain at least one candidate cluster.
Referring to fig. 3, as a possible example, if the number 3 is a preset number threshold and the radius of the circle is a preset radius 600, fig. 3 includes 6 core points 501, 2 edge points 502 and 1 noise point 503. The edge point 502 is a center point coordinate based on the coordinate information, and the sum of the number of other center point coordinates falling within the preset radius 600 is not greater than the preset number threshold, but falls within the preset radius 600 of other core points 501. The noise point 503 is the center point coordinate based on the coordinate information, and the sum of the number of other center point coordinates falling within the preset radius 600 is not greater than the preset number threshold, while not falling within the preset radius 600 of the other core points 501.
Optionally, at least one candidate cluster is obtained by merging the clusters with the edge closest distance not greater than the preset distance threshold. Referring to fig. 4, the candidate cluster M701 and the candidate cluster N702 are included, wherein the closest distance between the edges of the candidate cluster M701 and the candidate cluster N702 is a distance d.
Optionally, for the center point coordinate, as a possible implementation, the center point coordinate may be a longitude and latitude coordinate. Of course, the center point coordinate may also be based on other coordinate information of the same reference, and the embodiment of the present application is not particularly limited.
Referring to fig. 5, the present embodiment further provides a data expansion apparatus 110, wherein the data expansion apparatus 110 includes at least one functional module capable of being stored in the memory 120 in a software form. Functionally divided, the data expansion apparatus 110 may include a location acquisition module 1101, a cluster analysis module 1102, a location calculation module 1103, and a data expansion module 1104.
The position acquiring module 1101 is configured to acquire position information of a target person at the same IP address when the target person uses the electronic device to generate a preset behavior, where the position information includes first position information carrying coordinate information and second position information not carrying coordinate information, and the coordinate information includes a center point coordinate and an error radius based on the center point coordinate.
In the embodiment of the present application, the position acquiring module 1101 is configured to execute step S100 in fig. 2, and reference may be made to step S100 for a detailed description of the position acquiring module 1101.
The cluster analysis module 1102 is configured to perform clustering processing on the coordinate information in the first location information through a density clustering algorithm DBSCAN algorithm to obtain at least one candidate cluster.
In this embodiment of the application, the cluster analysis module 1102 is configured to execute step S200 in fig. 2, and reference may be made to step S200 for detailed description of the cluster analysis module 1102.
The position calculating module 1103 is configured to select a target cluster from the candidate clusters, and calculate to obtain target coordinate information according to coordinate information included in the target cluster.
In the embodiment of the present application, the position calculating module 1103 is configured to perform step S300 in fig. 2, and for a detailed description of the position calculating module 1103, reference may be made to the detailed description of step S300.
The data expansion module 1104 is configured to use the target coordinate information as the coordinate information in the second location information.
In the embodiment of the present application, the data expansion module 1104 is configured to perform step S400 in fig. 2, and the detailed description about the data expansion module 1104 can refer to the detailed description about step S400.
Optionally, the position calculating module 1103 selects the target cluster by the following manner;
and selecting the candidate cluster containing the most coordinate information from the candidate clusters as the target cluster.
The embodiment of the present application further provides a data processing apparatus 100, where the data processing apparatus 100 includes a processor 130 and a memory 120, the memory 120 stores machine executable instructions, and when the processor 130 executes the machine executable instructions, the data expansion method is implemented.
The embodiment of the application also provides a storage medium, wherein the storage medium stores a computer program, and the computer program realizes the data expansion method when being processed and executed.
In summary, the embodiments of the present application provide a data expansion method, an apparatus, a data processing device, and a storage medium. Based on the existing coordinate information of the target personnel, clustering the existing coordinate information of the target personnel through a DBSCAN algorithm to obtain at least one candidate cluster. And selecting a target cluster from the candidate clusters, and calculating the coordinate information in the second position information based on the coordinate information in the target cluster. Because the clustering result of the DBSCAN algorithm is more reasonable, the accuracy of the calculated coordinate information in the second position information is improved.
In the embodiments provided in the present application, it should be understood that the disclosed apparatus and method may be implemented in other ways. The apparatus embodiments described above are merely illustrative, and for example, the flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of apparatus, methods and computer program products according to various embodiments of the present application. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
In addition, functional modules in the embodiments of the present application may be integrated together to form an independent part, or each module may exist separately, or two or more modules may be integrated to form an independent part.
The functions may be stored in a computer-readable storage medium if they are implemented in the form of software functional modules and sold or used as separate products. Based on such understanding, the technical solution of the present application or portions thereof that substantially contribute to the prior art may be embodied in the form of a software product stored in a storage medium and including instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present application. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk, or an optical disk, and various media capable of storing program codes.
It is noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising a … …" does not exclude the presence of another identical element in a process, method, article, or apparatus that comprises the element.
The above description is only for various embodiments of the present application, but the scope of the present application is not limited thereto, and any person skilled in the art can easily conceive of changes or substitutions within the technical scope of the present application, and all such changes or substitutions are included in the scope of the present application. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.

Claims (7)

1. A data expansion method applied to a data processing device, the method comprising:
acquiring position information under the same IP address when a target person uses electronic equipment to generate a preset behavior, wherein the position information comprises first position information carrying coordinate information and second position information not carrying coordinate information, and the coordinate information comprises a central point coordinate and an error radius based on the central point coordinate;
aiming at the first position information, clustering coordinate information in the first position information through a density-based clustering algorithm (DBSCAN) algorithm to obtain at least one candidate cluster;
selecting the candidate cluster containing the most coordinate information from the candidate clusters, and acquiring the number of the coordinate information in the candidate cluster;
comparing the number of the coordinate information with a preset target threshold;
if the number of the coordinate information is larger than the preset target threshold, taking the candidate cluster containing the most coordinate information as a target cluster;
calculating to obtain target coordinate information according to the coordinate information contained in the target cluster;
taking the target coordinate information as coordinate information in the second position information;
if the number of the coordinate information is less than or equal to the preset target threshold, selecting position information to be expanded from the second position information;
comparing the time information of the position information to be expanded with the time information corresponding to each coordinate information in the first position information, and selecting the coordinate information with the closest time as reference coordinate information;
taking the candidate cluster to which the reference coordinate information belongs as a target cluster of the position information to be expanded;
calculating to obtain target coordinate information according to the coordinate information contained in the target cluster;
and taking the target coordinate information as the coordinate information of the position information to be expanded.
2. The data expansion method of claim 1, wherein the target coordinate information includes a center point coordinate of the target cluster and an error radius based on the center point coordinate of the target cluster, and the step of obtaining the target coordinate information by calculating according to the coordinate information included in the target cluster includes:
calculating the average coordinate of the center point coordinate of each coordinate information in the target cluster according to the coordinate information in the target cluster, and taking the average coordinate as the center point coordinate of the target cluster;
and calculating the average error radius of the error radius of each coordinate information of the target cluster, and taking the average error radius as the error radius of the center point coordinate of the target cluster.
3. The data expansion method according to claim 2, wherein the step of clustering the coordinate information in the first location information by a density clustering algorithm DBSCAN algorithm to obtain at least one candidate cluster comprises:
for each piece of coordinate information in the first position information, searching a core point in the first position information according to a preset radius and a preset number threshold, wherein the core point is a center point coordinate based on the coordinate information, the sum of the number of other center point coordinates falling within the preset radius is greater than the preset number threshold, and each core point corresponds to a cluster with the preset radius;
and according to a preset distance threshold, combining the clusters with the edge closest distance not greater than the preset distance threshold to obtain at least one candidate cluster.
4. The data expansion method of claim 3, wherein the center point coordinates are latitude and longitude coordinates.
5. A data expansion device is characterized by being applied to electronic equipment and comprising a position acquisition module, a clustering analysis module, a position calculation module and a data expansion module;
the position acquisition module is used for acquiring position information under the same IP address when a target person uses the electronic equipment to generate a preset behavior, wherein the position information comprises first position information carrying coordinate information and second position information not carrying coordinate information, and the coordinate information comprises a central point coordinate and an error radius based on the central point coordinate;
the cluster analysis module is used for clustering the coordinate information in the first position information through a density clustering algorithm DBSCAN algorithm to obtain at least one candidate cluster;
selecting the candidate cluster containing the most coordinate information from the candidate clusters, and acquiring the number of the coordinate information in the candidate cluster;
comparing the number of the coordinate information with a preset target threshold;
if the number of the coordinate information is larger than the preset target threshold, taking the candidate cluster containing the most coordinate information as a target cluster;
the position calculation module is used for selecting a target cluster from the candidate clusters and calculating to obtain target coordinate information according to coordinate information contained in the target cluster;
the data expansion module is used for taking the target coordinate information as coordinate information in the second position information;
if the number of the coordinate information is less than or equal to the preset target threshold, selecting position information to be expanded from the second position information;
comparing the time information of the position information to be expanded with the time information corresponding to each coordinate information in the first position information, and selecting the coordinate information with the closest time as reference coordinate information;
taking the candidate cluster to which the reference coordinate information belongs as a target cluster of the position information to be expanded;
calculating to obtain target coordinate information according to the coordinate information contained in the target cluster;
and taking the target coordinate information as the coordinate information of the position information to be expanded.
6. A data processing apparatus comprising a processor and a memory, said memory storing machine executable instructions which, when executed by said processor, implement a data augmentation method as claimed in any one of claims 1 to 4.
7. A storage medium storing a computer program which, when executed, implements a data expansion method according to any one of claims 1 to 4.
CN201911236586.0A 2019-12-05 2019-12-05 Data expansion method and device, data processing equipment and storage medium Active CN110888866B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201911236586.0A CN110888866B (en) 2019-12-05 2019-12-05 Data expansion method and device, data processing equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911236586.0A CN110888866B (en) 2019-12-05 2019-12-05 Data expansion method and device, data processing equipment and storage medium

Publications (2)

Publication Number Publication Date
CN110888866A CN110888866A (en) 2020-03-17
CN110888866B true CN110888866B (en) 2023-01-17

Family

ID=69750714

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911236586.0A Active CN110888866B (en) 2019-12-05 2019-12-05 Data expansion method and device, data processing equipment and storage medium

Country Status (1)

Country Link
CN (1) CN110888866B (en)

Families Citing this family (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111898648A (en) * 2020-07-07 2020-11-06 北京三快在线科技有限公司 Road network graph generation method and device, electronic equipment and computer readable storage medium
CN112229411B (en) * 2020-10-15 2021-12-07 广州小鹏自动驾驶科技有限公司 Data processing method and device
CN112949811A (en) * 2021-03-16 2021-06-11 广州虎牙科技有限公司 Data processing method, device, equipment and medium
CN113610008B (en) * 2021-08-10 2022-09-16 北京百度网讯科技有限公司 Method, device, equipment and storage medium for acquiring state of slag car
CN113836250A (en) * 2021-09-09 2021-12-24 摩拜(北京)信息技术有限公司 Vehicle positioning method and device and electronic equipment

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103227821A (en) * 2013-04-03 2013-07-31 腾讯科技(深圳)有限公司 Method and device for processing position data of target user
CN109522374A (en) * 2018-09-14 2019-03-26 北京三快在线科技有限公司 A kind of localization method, device, electronic equipment and readable storage medium storing program for executing
CN109698999A (en) * 2017-10-23 2019-04-30 深圳市优必选科技有限公司 Localization method and terminal device
CN109714699A (en) * 2017-10-23 2019-05-03 深圳市优必选科技有限公司 Localization method and terminal device

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106455056B (en) * 2016-11-14 2020-02-04 百度在线网络技术(北京)有限公司 Positioning method and device
CN109302706B (en) * 2017-07-24 2021-11-19 中国移动通信有限公司研究院 Pseudo base station positioning method and device and computer readable storage medium
CN108307308B (en) * 2018-01-25 2021-09-14 北京小米移动软件有限公司 Positioning method, device and storage medium for wireless local area network equipment

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103227821A (en) * 2013-04-03 2013-07-31 腾讯科技(深圳)有限公司 Method and device for processing position data of target user
CN109698999A (en) * 2017-10-23 2019-04-30 深圳市优必选科技有限公司 Localization method and terminal device
CN109714699A (en) * 2017-10-23 2019-05-03 深圳市优必选科技有限公司 Localization method and terminal device
CN109522374A (en) * 2018-09-14 2019-03-26 北京三快在线科技有限公司 A kind of localization method, device, electronic equipment and readable storage medium storing program for executing

Also Published As

Publication number Publication date
CN110888866A (en) 2020-03-17

Similar Documents

Publication Publication Date Title
CN110888866B (en) Data expansion method and device, data processing equipment and storage medium
CN109992633B (en) User position-based geo-fence determination method and device and electronic equipment
CN108733706B (en) Method and device for generating heat information
CN107341220B (en) Multi-source data fusion method and device
KR102121361B1 (en) Method and device for identifying the type of geographic location where the user is located
CN107885796B (en) Information recommendation method, device and equipment
TWI703862B (en) Content recommendation method and device
CA3064137A1 (en) Method and device for recommending information
JP7162753B2 (en) Quantitative geospatial analysis of device location data
CN108182253B (en) Method and apparatus for generating information
CN110110201B (en) Content recommendation method and system
US20190095536A1 (en) Method and device for content recommendation and computer readable storage medium
CN108921587B (en) Data processing method and device and server
US20150148058A1 (en) Mobile device analytics
CN110427574B (en) Route similarity determination method, device, equipment and medium
CN108536695B (en) Aggregation method and device of geographic position information points
CN110245684B (en) Data processing method, electronic device, and medium
US10921151B2 (en) Objective generation of a point of interest score based on quantities of user stops
CN113961780A (en) Resident cell acquisition method and device, electronic equipment and storage medium
CN108170795B (en) Information pushing method, device and equipment
CN111711668A (en) Method, device and computer equipment for pushing service in real time based on POI (Point of interest)
CN109189810B (en) Query method, query device, electronic equipment and computer-readable storage medium
CN110555151A (en) Search term determination method and device, electronic equipment and storage medium
CN108537570B (en) Information exposure method and device
CN110990714A (en) User behavior intention prediction method and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant