CN112819593B - Data analysis method, device, equipment and medium based on position information - Google Patents

Data analysis method, device, equipment and medium based on position information Download PDF

Info

Publication number
CN112819593B
CN112819593B CN202110416584.0A CN202110416584A CN112819593B CN 112819593 B CN112819593 B CN 112819593B CN 202110416584 A CN202110416584 A CN 202110416584A CN 112819593 B CN112819593 B CN 112819593B
Authority
CN
China
Prior art keywords
user
target
code
users
geographic position
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202110416584.0A
Other languages
Chinese (zh)
Other versions
CN112819593A (en
Inventor
张莉
任杰
吴志成
乔延柯
袁雅云
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Ping An Technology Shenzhen Co Ltd
Original Assignee
Ping An Technology Shenzhen Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Ping An Technology Shenzhen Co Ltd filed Critical Ping An Technology Shenzhen Co Ltd
Priority to CN202110416584.0A priority Critical patent/CN112819593B/en
Publication of CN112819593A publication Critical patent/CN112819593A/en
Application granted granted Critical
Publication of CN112819593B publication Critical patent/CN112819593B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/06Buying, selling or leasing transactions
    • G06Q30/0601Electronic shopping [e-shopping]
    • G06Q30/0631Item recommendations
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9537Spatial or temporal dependent retrieval, e.g. spatiotemporal queries
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q40/00Finance; Insurance; Tax strategies; Processing of corporate or income taxes
    • G06Q40/08Insurance

Landscapes

  • Engineering & Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • Accounting & Taxation (AREA)
  • Finance (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Marketing (AREA)
  • Development Economics (AREA)
  • Economics (AREA)
  • Strategic Management (AREA)
  • General Business, Economics & Management (AREA)
  • General Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Technology Law (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention relates to the field of big data, and provides a data analysis method, a device, equipment and a medium based on position information, which can construct an initial relation graph according to a geographic position service code of each user so as to associate different users based on graph theory, clean the initial relation graph to obtain a target relation graph, further screen out associated users, calculate geographic position service behavior similarity between every two users in the target relation graph, detect a user similar to the user to be predicted from a tag user as a target user, acquire the occupation of the target user as the predicted occupation of the user to be predicted, further automatically predict the occupation of the user by combining geographic position service data and the graph theory, solve the problem of low efficiency of manual acquisition and improve the prediction efficiency. In addition, the invention also relates to a block chain technology, and the target relation graph can be stored in the block chain nodes.

Description

Data analysis method, device, equipment and medium based on position information
Technical Field
The invention relates to the technical field of big data, in particular to a data analysis method, a device, equipment and a medium based on position information.
Background
At present, for many fields such as insurance company increase and the like, acquisition of user occupation information is involved, so that corresponding increase talent skills can be well applied to users, or other targeted marketing means are adopted, more accurate products can be recommended for the users, or better services can be provided for the users, and user experience is optimized.
However, in the conventional means, the professional information of the user usually needs to be manually collected, which cannot be obtained in a large batch, and the efficiency is low.
In the automatic prediction scheme based on machine learning, the comprehensiveness of training data and the performance of a system mounted on the training data are limited, and the processing speed and accuracy are required to be improved.
Disclosure of Invention
In view of the above, there is a need to provide a method, an apparatus, a device and a medium for data analysis based on location information, which can automatically perform job prediction on a user by combining geographic location service data and graph theory, thereby not only solving the problem of inefficient manual collection, but also improving the prediction efficiency.
A method for data analysis based on location information, the method comprising:
starting a preset acquisition device, and acquiring geographic position service data of at least one user within a preset time range by using the preset acquisition device;
extracting target geographical location service data of each user from the collected geographical location service data;
transcoding the target geographical location service data of each user to obtain a geographical location service code of each user;
constructing an initial relation graph according to the geographical position service code of each user;
cleaning the initial relation graph to obtain a target relation graph;
calculating the geographic position service behavior similarity between every two users in the target relation graph;
acquiring a user to be predicted, and identifying a user with known occupation in the target relation graph as a tag user;
and detecting a target user from the tag users according to the similarity of geographic position service behaviors between every two users based on an improved tag propagation algorithm, and acquiring the occupation of the target user as the predicted occupation of the user to be predicted.
According to a preferred embodiment of the present invention, the extracting the target geo-location service data of each user from the collected geo-location service data includes:
screening the geographical location service data of each user from the collected geographical location service data;
calculating the frequency of each geographic position service data in the geographic position service data of the corresponding user;
extracting the geographical location service data with the frequency greater than or equal to the preset frequency from the geographical location service data of each user;
and determining the extracted geographic position service data as the target geographic position service data of each user.
According to a preferred embodiment of the present invention, the constructing the initial relationship graph according to the geo-location service code of each user comprises:
acquiring the number of public character strings contained in the geographic position service code between every two users;
and when detecting that the number of the public character strings contained in the geographic position service code between the two users is greater than or equal to the configuration number, connecting the two detected users to obtain the initial relationship graph.
According to a preferred embodiment of the present invention, the cleaning the initial relationship diagram to obtain the target relationship diagram includes:
acquiring a first pre-preset bit character of the geographic position service code of each user in the initial relationship graph to construct a sub-code of each user;
acquiring the total number of users in the initial relationship graph and the number of users to which each sub-code belongs;
acquiring the coverage rate of each sub-code according to the number of the users to which each sub-code belongs and the total number of people;
acquiring the sub-codes with the coverage rate larger than or equal to the configured coverage rate as alternative sub-codes;
obtaining the IDF value of each alternative subcode according to the total number of people and the number of people of the user to which each subcode belongs;
sequencing each alternative sub-code according to the sequence of the IDF value from high to low, and acquiring the alternative sub-codes arranged at the second front preset position as target sub-codes;
and reserving the user with the target subcode in the initial relationship graph, and deleting other users to obtain the target relationship graph.
According to the preferred embodiment of the present invention, the calculating the similarity of the geographic location service behavior between every two users in the target relationship graph includes:
acquiring a geographical position service code of each user in the target relation graph;
counting the occurrence frequency of the geographic position service code of each user;
sorting the occurrence frequency of the geographical location service codes of each user from high to low;
acquiring a geographical location service code arranged in a third front preset position as a first code of each user;
calculating the number of times of occurrence of the first code of each user in the target sub-code;
constructing a sequence of each user according to the times of the first code of each user appearing in the target sub-code;
and calculating the cosine distance between every two users according to the sequence of each user as the geographic position service behavior similarity between every two users.
According to a preferred embodiment of the present invention, the detecting a target user from the tag users according to the similarity of the geographic location service behavior between each two users based on the improved tag propagation algorithm includes:
calculating the weight of each edge in the target relational graph according to the geographic position service behavior similarity between every two users;
constructing a probability transfer matrix according to the weight of each edge;
calculating the propagation probability from each label user to the user to be predicted according to the probability transition matrix;
executing propagation according to the propagation probability and updating the probability distribution of each label user;
and when the probability distribution of each label user is converged, stopping propagation, identifying the highest probability from the probability distribution of each label user obtained after propagation, and determining the label user corresponding to the highest probability as the target user.
According to the preferred embodiment of the present invention, the following formula is adopted to calculate the weight of each edge in the target relationship graph according to the geographic location service behavior similarity between every two users:
Figure DEST_PATH_IMAGE001
wherein the content of the first and second substances,
Figure 729101DEST_PATH_IMAGE002
representing the weight of the edge between user i and user j,
Figure DEST_PATH_IMAGE003
the configuration parameters are represented by a representation of,
Figure 124310DEST_PATH_IMAGE004
representing the similarity of geographic location service behavior between user i and user j.
A location information based data analysis device, the location information based data analysis device comprising:
the system comprises a collecting unit, a processing unit and a processing unit, wherein the collecting unit is used for starting a preset collecting device and collecting geographic position service data of at least one user in a preset time range by using the preset collecting device;
the extraction unit is used for extracting target geographic position service data of each user from the collected geographic position service data;
the transcoding unit is used for transcoding the target geographical location service data of each user to obtain a geographical location service code of each user;
the building unit is used for building an initial relation graph according to the geographic position service code of each user;
the cleaning unit is used for cleaning the initial relation graph to obtain a target relation graph;
the computing unit is used for computing the geographic position service behavior similarity between every two users in the target relation graph;
the identification unit is used for acquiring a user to be predicted and identifying a user with known occupation in the target relation graph as a label user;
and the prediction unit is used for detecting a target user from the tag users according to the similarity of the geographic position service behaviors between every two users based on an improved tag propagation algorithm and acquiring the occupation of the target user as the predicted occupation of the user to be predicted.
An electronic device, the electronic device comprising:
a memory storing at least one instruction; and
a processor executing instructions stored in the memory to implement the location information based data analysis method.
A computer-readable storage medium having at least one instruction stored therein, the at least one instruction being executable by a processor in an electronic device to implement the method for location information based data analysis.
According to the technical scheme, the invention can start a preset acquisition device, acquire the geographic position service data of at least one user in a preset time range by using the preset acquisition device, extract the target geographic position service data of each user from the acquired geographic position service data, transcode the target geographic position service data of each user to obtain the geographic position service code of each user, construct an initial relation graph according to the geographic position service code of each user so as to associate different users based on graph theory, clean the initial relation graph to obtain a target relation graph, further screen out the associated users, calculate the geographic position service behavior similarity between every two users in the target relation graph, acquire the user to be predicted, and identify the user with known occupation in the target relation graph as a label user, and detecting a user similar to the user to be predicted from the tag user as a target user, acquiring the occupation of the target user as the predicted occupation of the user to be predicted, and automatically predicting the occupation of the user by combining geographic position service data and graph theory, so that the problem of low efficiency of manual acquisition is solved, and the prediction efficiency is improved.
Drawings
FIG. 1 is a flow chart of a preferred embodiment of the data analysis method based on location information according to the present invention.
FIG. 2 is a functional block diagram of a preferred embodiment of the data analysis device based on position information according to the present invention.
Fig. 3 is a schematic structural diagram of an electronic device implementing a data analysis method based on location information according to a preferred embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention will be described in detail with reference to the accompanying drawings and specific embodiments.
Fig. 1 is a flow chart of a preferred embodiment of the data analysis method based on location information according to the present invention. The order of the steps in the flow chart may be changed and some steps may be omitted according to different needs.
The data analysis method based on the location information is applied to one or more electronic devices, which are devices capable of automatically performing numerical calculation and/or information processing according to preset or stored instructions, and the hardware thereof includes, but is not limited to, a microprocessor, an Application Specific Integrated Circuit (ASIC), a Programmable Gate Array (FPGA), a Digital Signal Processor (DSP), an embedded device, and the like.
The electronic device may be any electronic product capable of performing human-computer interaction with a user, for example, a Personal computer, a tablet computer, a smart phone, a Personal Digital Assistant (PDA), a game machine, an interactive Internet Protocol Television (IPTV), an intelligent wearable device, and the like.
The electronic device may also include a network device and/or a user device. The network device includes, but is not limited to, a single network server, a server group consisting of a plurality of network servers, or a Cloud Computing (Cloud Computing) based Cloud consisting of a large number of hosts or network servers.
The Network where the electronic device is located includes, but is not limited to, the internet, a wide area Network, a metropolitan area Network, a local area Network, a Virtual Private Network (VPN), and the like.
And S10, starting a preset acquisition device, and acquiring the geographic position service data of at least one user within a preset time range by using the preset acquisition device.
In this embodiment, the preset collecting device includes, but is not limited to: the intelligent terminal of user, the monitoring device in appointed area.
In this embodiment, the at least one user may include users having the same or different vocational information, and the vocational of some of the at least one user is known and the vocational of some of the users is unknown.
In this embodiment, the preset time range may be configured by self-definition, for example: the working period of all working days within 3 months is 9:00-17: 00.
In this embodiment, the geographic location service data refers to obtaining a current location of the positioning device by using various types of positioning technologies, and providing information resources and basic services to the positioning device through the mobile internet.
S11, extracting the target geographical location service data of each user from the collected geographical location service data.
In at least one embodiment of the present invention, the extracting the target geo-location service data of each user from the collected geo-location service data includes:
screening the geographical location service data of each user from the collected geographical location service data;
calculating the frequency of each geographic position service data in the geographic position service data of the corresponding user;
extracting the geographical location service data with the frequency greater than or equal to the preset frequency from the geographical location service data of each user;
and determining the extracted geographic position service data as the target geographic position service data of each user.
The preset frequency can be configured by user-defined, and the invention is not limited.
It should be noted that the target geographic location service data of each user may be one or more.
Through the embodiment, the high-frequency geographic position service data corresponding to each user is extracted from the collected geographic position service data and serves as the target geographic position service data of each user, so that the high-frequency information is used for carrying out professional prediction in the following process, and the interference of redundant data on the prediction result is reduced.
And S12, transcoding the target geographical location service data of each user to obtain the geographical location service code of each user.
In at least one embodiment of the present invention, the transcoding the target geo-location service data of each user to obtain the geo-location service code of each user includes:
configuring the splitting times, the angle range corresponding to each splitting and the value corresponding to each angle range;
acquiring longitude and latitude of target geographic position service data of each user;
performing the segmentation of the segmentation times on the target geographic position service data of each user based on a Geohash algorithm and the longitude and latitude of the target geographic position service data of each user;
after each segmentation, calculating transcoding of longitude of target geographic position service data of each user according to an angle range corresponding to the current segmentation and a value corresponding to the current angle range, and calculating transcoding of latitude of the target geographic position service data of each user;
inserting the transcoding of the longitude of the target geographic position service data of each user into an even number bit, and inserting the transcoding of the latitude of the target geographic position service data of each user into an odd number bit to obtain the binary coding of the geographic position service data of each user;
and converting the binary code of the geographic position service data of each user into a decimal system to obtain the geographic position service code of each user.
For example: for the geo-location service data (39.923201, 116.390705), at the time of slicing, setting a latitude range of [ -90 °,0 °) to be represented by binary 0, (0 °, 90 ° ] to be represented by binary 1, a longitude range of [ -180 °,0 °) to be represented by binary 0, (0 °, 180 ° ] to be represented by binary 1, and so on, calculating a transcoding of the longitude of the geo-location service data (39.923201, 116.390705) as: 10111000110001111001, transcoding of latitude: 11010010110001000100, merging the longitude and latitude transcoding, arranging the longitude transcoding in even digits and arranging the latitude transcoding in odd digits to obtain binary codes of the geographic location service data (39.923201, 116.390705), wherein: 1110011101001000111100000011010101100001, coded according to Base32, binary to decimal, every 5 binary to one decimal, giving the geolocation service code of the geolocation service data (39.923201, 116.390705) as: wx4g0ec 1.
And S13, constructing an initial relationship graph according to the geographic position service code of each user.
In at least one embodiment of the present invention, the constructing the initial relationship graph according to the geo-location service code of each user includes:
acquiring the number of public character strings contained in the geographic position service code between every two users;
and when detecting that the number of the public character strings contained in the geographic position service code between the two users is greater than or equal to the configuration number, connecting the two detected users to obtain the initial relationship graph.
It can be understood that the more common character strings between two geo-location service codes, the closer the connection between two users is, the higher the relevance is, the user with higher relevance is connected, and then the initial relationship graph is constructed based on graph theory to associate different users.
The initial relationship graph comprises users with known professions and users with unknown professions, and therefore the user professions can be predicted conveniently according to the users with known professions.
And S14, cleaning the initial relation graph to obtain a target relation graph.
In at least one embodiment of the present invention, the cleaning the initial relationship diagram to obtain the target relationship diagram includes:
acquiring a first pre-preset bit character of the geographic position service code of each user in the initial relationship graph to construct a sub-code of each user;
acquiring the total number of users in the initial relationship graph and the number of users to which each sub-code belongs;
acquiring the coverage rate of each sub-code according to the number of the users to which each sub-code belongs and the total number of people;
acquiring the sub-codes with the coverage rate larger than or equal to the configured coverage rate as alternative sub-codes;
calculating the IDF (Inverse Document Frequency) value of each optional sub-code according to the total number of people and the number of people of the user to which each sub-code belongs;
sequencing each alternative sub-code according to the sequence of the IDF value from high to low, and acquiring the alternative sub-codes arranged at the second front preset position as target sub-codes;
and reserving the user with the target subcode in the initial relationship graph, and deleting other users to obtain the target relationship graph.
Where IDF = log (total number of persons/number of persons of the user to which the subcode belongs).
Wherein, the first pre-set bit and the second pre-set bit can be configured by user, for example: the first front preset bit may be configured as a front 6 bit, and the second front preset bit may be configured as a front 50 bit.
Wherein, the configuration coverage rate can also be configured by self-defining, such as: 5 percent.
In this embodiment, first, a uniform truncation process is performed on the geo-location service code of each user, that is, a first pre-set bit character of the geo-location service code of each user in the initial relationship diagram is obtained to construct a sub-code of each user, so as to save the calculation amount.
In this embodiment, by filtering the coverage, meaningless calculation of the IDF value can be avoided, for example: for locations such as train stations, where traffic is high and occupation is not regular, if coverage is not considered, due to user attributes such as the same location where IDF reacts, then for locations such as train stations, it would be meaningless to calculate IDF.
Through the embodiment, the associated users can be further screened out.
And S15, calculating the similarity of geographic position service behaviors between every two users in the target relationship graph.
In at least one embodiment of the present invention, the calculating the similarity of the geographic location service behavior between every two users in the target relationship graph includes:
acquiring a geographical position service code of each user in the target relation graph;
counting the occurrence frequency of the geographic position service code of each user;
sorting the occurrence frequency of the geographical location service codes of each user from high to low;
acquiring a geographical location service code arranged in a third front preset position as a first code of each user;
calculating the number of times of occurrence of the first code of each user in the target sub-code;
constructing a sequence of each user according to the times of the first code of each user appearing in the target sub-code;
and calculating the cosine distance between every two users according to the sequence of each user as the geographic position service behavior similarity between every two users.
The third front preset bit may be configured by user, for example, the first 20 bits.
For example: and calculating the frequency of the 20 first codes of the user in each month in the 50 target sub-codes to form a Sequence [1,2,0, … 1] with the length of 50, wherein the first bit of the Sequence [1,2,0, … 1] represents 1 time, the second bit represents 2 times, the third bit does not represent, and so on, which is not described herein.
And S16, acquiring the user to be predicted, and identifying the user with known occupation in the target relation graph as a label user.
It should be noted that the user to be predicted is a user in the target relationship graph.
Further, the user to be predicted is a user whose occupation is unknown in the target relationship diagram, and the present embodiment aims to predict the occupation of the user to be predicted by the user whose occupation is known in the target relationship diagram.
S17, based on an improved label propagation algorithm, detecting a target user from the label users according to the similarity of geographic position service behaviors between every two users, and acquiring the occupation of the target user as the predicted occupation of the user to be predicted.
In at least one embodiment of the invention, the improved label propagation algorithm predicts the unmarked node label information from the marked node label information, and then establishes a complete graph model by using the relation between samples.
Specifically, the detecting a target user from the tag users according to the similarity of the geographic location service behavior between every two users based on the improved tag propagation algorithm includes:
calculating the weight of each edge in the target relational graph according to the geographic position service behavior similarity between every two users;
constructing a probability transfer matrix according to the weight of each edge;
calculating the propagation probability from each label user to the user to be predicted according to the probability transition matrix;
executing propagation according to the propagation probability and updating the probability distribution of each label user;
and when the probability distribution of each label user is converged, stopping propagation, identifying the highest probability from the probability distribution of each label user obtained after propagation, and determining the label user corresponding to the highest probability as the target user.
It should be noted that, the euclidean distance is calculated by the conventional label propagation algorithm, the relevance with the occupation in the calculation process is weak, the interpretability is weak, so accurate occupational prediction cannot be carried out, the implementation mode optimizes the traditional label propagation algorithm, the similarity of the geographic position service behavior between the users replaces the Euclidean distance in the traditional label propagation algorithm, improves the interpretability of the algorithm, meanwhile, the geographic position service data contains stronger work related information and does not relate to too much privacy information of the user, the geographic position service is utilized for processing, and combines graph theory and label propagation algorithm, on one hand, the calculation efficiency is improved through the constructed graph, and label propagation occurs in the constructed graph, the calculation speed is higher, continuous information of the working time period is extracted, and the prediction accuracy is higher.
Specifically, the weight of each edge in the target relationship graph is calculated according to the geographic location service behavior similarity between every two users by adopting the following formula:
Figure 938682DEST_PATH_IMAGE001
wherein the content of the first and second substances,
Figure 761145DEST_PATH_IMAGE002
representing the weight of the edge between user i and user j,
Figure 965861DEST_PATH_IMAGE003
the configuration parameters are represented by a representation of,
Figure 164761DEST_PATH_IMAGE004
represents the user i andgeographic location service behavior similarity between users j.
In the present embodiment, it is preferred that,
Figure 833640DEST_PATH_IMAGE003
custom configuration may be performed, and the invention is not limited.
Specifically, the propagation probability is calculated as follows:
Figure DEST_PATH_IMAGE005
wherein the content of the first and second substances,
Figure 827004DEST_PATH_IMAGE006
representing the probability of propagation of user j to user i,
Figure DEST_PATH_IMAGE007
representing the probability of propagation of user j to user k, k representing the user connected to j,
Figure 768284DEST_PATH_IMAGE008
representing a tagged user of a professional known in the target relationship graph,
Figure DEST_PATH_IMAGE009
representing users whose occupation is unknown in the target relationship graph.
In the propagation process, the node formed by each user weights the values propagated by the surrounding nodes according to the propagation probability so as to update the probability distribution of the node.
Through the embodiment, occupation prediction can be automatically carried out on the user by combining the geographic position service data and the graph theory, the problem of low efficiency of manual collection is solved, and the prediction efficiency is improved.
It should be noted that, in order to further ensure the security of the data, the target relationship graph may be deployed in the blockchain, so as to avoid malicious tampering of the data.
According to the technical scheme, the invention can start a preset acquisition device, acquire the geographic position service data of at least one user in a preset time range by using the preset acquisition device, extract the target geographic position service data of each user from the acquired geographic position service data, transcode the target geographic position service data of each user to obtain the geographic position service code of each user, construct an initial relation graph according to the geographic position service code of each user so as to associate different users based on graph theory, clean the initial relation graph to obtain a target relation graph, further screen out the associated users, calculate the geographic position service behavior similarity between every two users in the target relation graph, acquire the user to be predicted, and identify the user with known occupation in the target relation graph as a label user, and detecting a user similar to the user to be predicted from the tag user as a target user, acquiring the occupation of the target user as the predicted occupation of the user to be predicted, and automatically predicting the occupation of the user by combining geographic position service data and graph theory, so that the problem of low efficiency of manual acquisition is solved, and the prediction efficiency is improved.
Fig. 2 is a functional block diagram of a data analysis device based on location information according to a preferred embodiment of the present invention. The data analysis device 11 based on the position information comprises an acquisition unit 110, an extraction unit 111, a transcoding unit 112, a construction unit 113, a cleaning unit 114, a calculation unit 115, a recognition unit 116 and a prediction unit 117. The module/unit referred to in the present invention refers to a series of computer program segments that can be executed by the processor 13 and that can perform a fixed function, and that are stored in the memory 12. In the present embodiment, the functions of the modules/units will be described in detail in the following embodiments.
The acquisition unit 110 starts a preset acquisition device, and acquires geographic location service data of at least one user within a preset time range by using the preset acquisition device.
In this embodiment, the preset collecting device includes, but is not limited to: the intelligent terminal of user, the monitoring device in appointed area.
In this embodiment, the at least one user may include users having the same or different vocational information, and the vocational of some of the at least one user is known and the vocational of some of the users is unknown.
In this embodiment, the preset time range may be configured by self-definition, for example: the working period of all working days within 3 months is 9:00-17: 00.
In this embodiment, the geographic location service data refers to obtaining a current location of the positioning device by using various types of positioning technologies, and providing information resources and basic services to the positioning device through the mobile internet.
The extraction unit 111 extracts target geo-location service data of each user from the collected geo-location service data.
In at least one embodiment of the present invention, the extracting unit 111 extracts the target geo-location service data of each user from the collected geo-location service data includes:
screening the geographical location service data of each user from the collected geographical location service data;
calculating the frequency of each geographic position service data in the geographic position service data of the corresponding user;
extracting the geographical location service data with the frequency greater than or equal to the preset frequency from the geographical location service data of each user;
and determining the extracted geographic position service data as the target geographic position service data of each user.
The preset frequency can be configured by user-defined, and the invention is not limited.
It should be noted that the target geographic location service data of each user may be one or more.
Through the embodiment, the high-frequency geographic position service data corresponding to each user is extracted from the collected geographic position service data and serves as the target geographic position service data of each user, so that the high-frequency information is used for carrying out professional prediction in the following process, and the interference of redundant data on the prediction result is reduced.
The transcoding unit 112 transcodes the target geo-location service data of each user to obtain the geo-location service code of each user.
In at least one embodiment of the present invention, the transcoding unit 112 transcodes the target geo-location service data of each user, and obtaining the geo-location service code of each user includes:
configuring the splitting times, the angle range corresponding to each splitting and the value corresponding to each angle range;
acquiring longitude and latitude of target geographic position service data of each user;
performing the segmentation of the segmentation times on the target geographic position service data of each user based on a Geohash algorithm and the longitude and latitude of the target geographic position service data of each user;
after each segmentation, calculating transcoding of longitude of target geographic position service data of each user according to an angle range corresponding to the current segmentation and a value corresponding to the current angle range, and calculating transcoding of latitude of the target geographic position service data of each user;
inserting the transcoding of the longitude of the target geographic position service data of each user into an even number bit, and inserting the transcoding of the latitude of the target geographic position service data of each user into an odd number bit to obtain the binary coding of the geographic position service data of each user;
and converting the binary code of the geographic position service data of each user into a decimal system to obtain the geographic position service code of each user.
For example: for the geo-location service data (39.923201, 116.390705), at the time of slicing, setting a latitude range of [ -90 °,0 °) to be represented by binary 0, (0 °, 90 ° ] to be represented by binary 1, a longitude range of [ -180 °,0 °) to be represented by binary 0, (0 °, 180 ° ] to be represented by binary 1, and so on, calculating a transcoding of the longitude of the geo-location service data (39.923201, 116.390705) as: 10111000110001111001, transcoding of latitude: 11010010110001000100, merging the longitude and latitude transcoding, arranging the longitude transcoding in even digits and arranging the latitude transcoding in odd digits to obtain binary codes of the geographic location service data (39.923201, 116.390705), wherein: 1110011101001000111100000011010101100001, coded according to Base32, binary to decimal, every 5 binary to one decimal, giving the geolocation service code of the geolocation service data (39.923201, 116.390705) as: wx4g0ec 1.
The construction unit 113 constructs an initial relationship graph according to the geo-location service code of each user.
In at least one embodiment of the present invention, the constructing unit 113 constructs the initial relationship graph according to the geo-location service code of each user, including:
acquiring the number of public character strings contained in the geographic position service code between every two users;
and when detecting that the number of the public character strings contained in the geographic position service code between the two users is greater than or equal to the configuration number, connecting the two detected users to obtain the initial relationship graph.
It can be understood that the more common character strings between two geo-location service codes, the closer the connection between two users is, the higher the relevance is, the user with higher relevance is connected, and then the initial relationship graph is constructed based on graph theory to associate different users.
The initial relationship graph comprises users with known professions and users with unknown professions, and therefore the user professions can be predicted conveniently according to the users with known professions.
The cleaning unit 114 cleans the initial relationship diagram to obtain a target relationship diagram.
In at least one embodiment of the present invention, the cleaning unit 114 cleans the initial relationship diagram, and obtaining the target relationship diagram includes:
acquiring a first pre-preset bit character of the geographic position service code of each user in the initial relationship graph to construct a sub-code of each user;
acquiring the total number of users in the initial relationship graph and the number of users to which each sub-code belongs;
acquiring the coverage rate of each sub-code according to the number of the users to which each sub-code belongs and the total number of people;
acquiring the sub-codes with the coverage rate larger than or equal to the configured coverage rate as alternative sub-codes;
calculating the IDF (Inverse Document Frequency) value of each optional sub-code according to the total number of people and the number of people of the user to which each sub-code belongs;
sequencing each alternative sub-code according to the sequence of the IDF value from high to low, and acquiring the alternative sub-codes arranged at the second front preset position as target sub-codes;
and reserving the user with the target subcode in the initial relationship graph, and deleting other users to obtain the target relationship graph.
Where IDF = log (total number of persons/number of persons of the user to which the subcode belongs).
Wherein, the first pre-set bit and the second pre-set bit can be configured by user, for example: the first front preset bit may be configured as a front 6 bit, and the second front preset bit may be configured as a front 50 bit.
Wherein, the configuration coverage rate can also be configured by self-defining, such as: 5 percent.
In this embodiment, first, a uniform truncation process is performed on the geo-location service code of each user, that is, a first pre-set bit character of the geo-location service code of each user in the initial relationship diagram is obtained to construct a sub-code of each user, so as to save the calculation amount.
In this embodiment, by filtering the coverage, meaningless calculation of the IDF value can be avoided, for example: for locations such as train stations, where traffic is high and occupation is not regular, if coverage is not considered, due to user attributes such as the same location where IDF reacts, then for locations such as train stations, it would be meaningless to calculate IDF.
Through the embodiment, the associated users can be further screened out.
The calculation unit 115 calculates the geographic location service behavior similarity between every two users in the target relationship graph.
In at least one embodiment of the present invention, the calculating unit 115 calculates the geographic location service behavior similarity between each two users in the target relationship graph includes:
acquiring a geographical position service code of each user in the target relation graph;
counting the occurrence frequency of the geographic position service code of each user;
sorting the occurrence frequency of the geographical location service codes of each user from high to low;
acquiring a geographical location service code arranged in a third front preset position as a first code of each user;
calculating the number of times of occurrence of the first code of each user in the target sub-code;
constructing a sequence of each user according to the times of the first code of each user appearing in the target sub-code;
and calculating the cosine distance between every two users according to the sequence of each user as the geographic position service behavior similarity between every two users.
The third front preset bit may be configured by user, for example, the first 20 bits.
For example: and calculating the frequency of the 20 first codes of the user in each month in the 50 target sub-codes to form a Sequence [1,2,0, … 1] with the length of 50, wherein the first bit of the Sequence [1,2,0, … 1] represents 1 time, the second bit represents 2 times, the third bit does not represent, and so on, which is not described herein.
The identification unit 116 acquires the user to be predicted, and identifies the user with a known occupation in the target relationship diagram as a tag user.
It should be noted that the user to be predicted is a user in the target relationship graph.
Further, the user to be predicted is a user whose occupation is unknown in the target relationship diagram, and the present embodiment aims to predict the occupation of the user to be predicted by the user whose occupation is known in the target relationship diagram.
The prediction unit 117 detects a target user from the tagged users according to the similarity of geographic location service behaviors between every two users based on an improved tag propagation algorithm, and obtains the occupation of the target user as the predicted occupation of the user to be predicted.
In at least one embodiment of the invention, the improved label propagation algorithm predicts the unmarked node label information from the marked node label information, and then establishes a complete graph model by using the relation between samples.
Specifically, the detecting, by the prediction unit 117, a target user from the tag users according to the similarity of the geographic location service behavior between each two users based on an improved tag propagation algorithm includes:
calculating the weight of each edge in the target relational graph according to the geographic position service behavior similarity between every two users;
constructing a probability transfer matrix according to the weight of each edge;
calculating the propagation probability from each label user to the user to be predicted according to the probability transition matrix;
executing propagation according to the propagation probability and updating the probability distribution of each label user;
and when the probability distribution of each label user is converged, stopping propagation, identifying the highest probability from the probability distribution of each label user obtained after propagation, and determining the label user corresponding to the highest probability as the target user.
It should be noted that, the euclidean distance is calculated by the conventional label propagation algorithm, the relevance with the occupation in the calculation process is weak, the interpretability is weak, so accurate occupational prediction cannot be carried out, the implementation mode optimizes the traditional label propagation algorithm, the similarity of the geographic position service behavior between the users replaces the Euclidean distance in the traditional label propagation algorithm, improves the interpretability of the algorithm, meanwhile, the geographic position service data contains stronger work related information and does not relate to too much privacy information of the user, the geographic position service is utilized for processing, and combines graph theory and label propagation algorithm, on one hand, the calculation efficiency is improved through the constructed graph, and label propagation occurs in the constructed graph, the calculation speed is higher, continuous information of the working time period is extracted, and the prediction accuracy is higher.
Specifically, the weight of each edge in the target relationship graph is calculated according to the geographic location service behavior similarity between every two users by adopting the following formula:
Figure 505296DEST_PATH_IMAGE001
wherein the content of the first and second substances,
Figure 28681DEST_PATH_IMAGE002
representing the weight of the edge between user i and user j,
Figure 458525DEST_PATH_IMAGE003
the configuration parameters are represented by a representation of,
Figure 700151DEST_PATH_IMAGE004
representing the similarity of geographic location service behavior between user i and user j.
In the present embodiment, it is preferred that,
Figure 912957DEST_PATH_IMAGE003
custom configuration may be performed, and the invention is not limited.
Specifically, the propagation probability is calculated as follows:
Figure 556428DEST_PATH_IMAGE005
wherein the content of the first and second substances,
Figure 891595DEST_PATH_IMAGE006
representing the probability of propagation of user j to user i,
Figure 620516DEST_PATH_IMAGE007
representing the probability of propagation of user j to user k, k representing the user connected to j,
Figure 699331DEST_PATH_IMAGE008
representing a tagged user of a professional known in the target relationship graph,
Figure 197308DEST_PATH_IMAGE009
representing users whose occupation is unknown in the target relationship graph.
In the propagation process, the node formed by each user weights the values propagated by the surrounding nodes according to the propagation probability so as to update the probability distribution of the node.
Through the embodiment, occupation prediction can be automatically carried out on the user by combining the geographic position service data and the graph theory, the problem of low efficiency of manual collection is solved, and the prediction efficiency is improved.
It should be noted that, in order to further ensure the security of the data, the target relationship graph may be deployed in the blockchain, so as to avoid malicious tampering of the data.
According to the technical scheme, the invention can start a preset acquisition device, acquire the geographic position service data of at least one user in a preset time range by using the preset acquisition device, extract the target geographic position service data of each user from the acquired geographic position service data, transcode the target geographic position service data of each user to obtain the geographic position service code of each user, construct an initial relation graph according to the geographic position service code of each user so as to associate different users based on graph theory, clean the initial relation graph to obtain a target relation graph, further screen out the associated users, calculate the geographic position service behavior similarity between every two users in the target relation graph, acquire the user to be predicted, and identify the user with known occupation in the target relation graph as a label user, and detecting a user similar to the user to be predicted from the tag user as a target user, acquiring the occupation of the target user as the predicted occupation of the user to be predicted, and automatically predicting the occupation of the user by combining geographic position service data and graph theory, so that the problem of low efficiency of manual acquisition is solved, and the prediction efficiency is improved.
Fig. 3 is a schematic structural diagram of an electronic device according to a preferred embodiment of the present invention, which implements a data analysis method based on location information.
The electronic device 1 may comprise a memory 12, a processor 13 and a bus, and may further comprise a computer program, such as a data analysis program based on location information, stored in the memory 12 and executable on the processor 13.
It will be understood by those skilled in the art that the schematic diagram is merely an example of the electronic device 1, and does not constitute a limitation to the electronic device 1, the electronic device 1 may have a bus-type structure or a star-type structure, the electronic device 1 may further include more or less hardware or software than those shown in the figures, or different component arrangements, for example, the electronic device 1 may further include an input and output device, a network access device, and the like.
It should be noted that the electronic device 1 is only an example, and other existing or future electronic products, such as those that can be adapted to the present invention, should also be included in the scope of the present invention, and are included herein by reference.
The memory 12 includes at least one type of readable storage medium, which includes flash memory, removable hard disks, multimedia cards, card-type memory (e.g., SD or DX memory, etc.), magnetic memory, magnetic disks, optical disks, etc. The memory 12 may in some embodiments be an internal storage unit of the electronic device 1, for example a removable hard disk of the electronic device 1. The memory 12 may also be an external storage device of the electronic device 1 in other embodiments, such as a plug-in mobile hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card), and the like provided on the electronic device 1. Further, the memory 12 may also include both an internal storage unit and an external storage device of the electronic device 1. The memory 12 may be used not only to store application software installed in the electronic device 1 and various types of data, such as codes of a data analysis program based on location information, etc., but also to temporarily store data that has been output or is to be output.
The processor 13 may be composed of an integrated circuit in some embodiments, for example, a single packaged integrated circuit, or may be composed of a plurality of integrated circuits packaged with the same or different functions, including one or more Central Processing Units (CPUs), microprocessors, digital Processing chips, graphics processors, and combinations of various control chips. The processor 13 is a Control Unit (Control Unit) of the electronic device 1, connects various components of the electronic device 1 by using various interfaces and lines, and executes various functions and processes data of the electronic device 1 by running or executing programs or modules (for example, executing a data analysis program based on location information, and the like) stored in the memory 12 and calling data stored in the memory 12.
The processor 13 executes an operating system of the electronic device 1 and various installed application programs. The processor 13 executes the application program to implement the steps in each of the above embodiments of the location information based data analysis method, such as the steps shown in fig. 1.
Illustratively, the computer program may be divided into one or more modules/units, which are stored in the memory 12 and executed by the processor 13 to accomplish the present invention. The one or more modules/units may be a series of computer program instruction segments capable of performing specific functions, which are used to describe the execution process of the computer program in the electronic device 1. For example, the computer program may be partitioned into an acquisition unit 110, an extraction unit 111, a transcoding unit 112, a construction unit 113, a cleaning unit 114, a calculation unit 115, an identification unit 116, a prediction unit 117.
The integrated unit implemented in the form of a software functional module may be stored in a computer-readable storage medium. The software functional module is stored in a storage medium and includes several instructions to enable a computer device (which may be a personal computer, a computer device, or a network device) or a processor (processor) to execute parts of the data analysis method based on location information according to the embodiments of the present invention.
The integrated modules/units of the electronic device 1 may be stored in a computer-readable storage medium if they are implemented in the form of software functional units and sold or used as separate products. Based on such understanding, all or part of the flow of the method according to the embodiments of the present invention may be implemented by a computer program, which may be stored in a computer-readable storage medium, and when the computer program is executed by a processor, the steps of the method embodiments may be implemented.
Wherein the computer program comprises computer program code, which may be in the form of source code, object code, an executable file or some intermediate form, etc. The computer-readable medium may include: any entity or device capable of carrying the computer program code, recording medium, U-disk, removable hard disk, magnetic disk, optical disk, computer Memory, Read-Only Memory (ROM), random-access Memory, or the like.
Further, the computer-readable storage medium may mainly include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required for at least one function, and the like; the storage data area may store data created according to the use of the blockchain node, and the like.
The block chain is a novel application mode of computer technologies such as distributed data storage, point-to-point transmission, a consensus mechanism, an encryption algorithm and the like. A block chain (Blockchain), which is essentially a decentralized database, is a series of data blocks associated by using a cryptographic method, and each data block contains information of a batch of network transactions, so as to verify the validity (anti-counterfeiting) of the information and generate a next block. The blockchain may include a blockchain underlying platform, a platform product service layer, an application service layer, and the like.
The bus may be a Peripheral Component Interconnect (PCI) bus, an Extended Industry Standard Architecture (EISA) bus, or the like. The bus may be divided into an address bus, a data bus, a control bus, etc. For ease of illustration, only one arrow is shown in FIG. 3, but this does not indicate only one bus or one type of bus. The bus is arranged to enable connection communication between the memory 12 and at least one processor 13 or the like.
Although not shown, the electronic device 1 may further include a power supply (such as a battery) for supplying power to each component, and preferably, the power supply may be logically connected to the at least one processor 13 through a power management device, so as to implement functions of charge management, discharge management, power consumption management, and the like through the power management device. The power supply may also include any component of one or more dc or ac power sources, recharging devices, power failure detection circuitry, power converters or inverters, power status indicators, and the like. The electronic device 1 may further include various sensors, a bluetooth module, a Wi-Fi module, and the like, which are not described herein again.
Further, the electronic device 1 may further include a network interface, and optionally, the network interface may include a wired interface and/or a wireless interface (such as a WI-FI interface, a bluetooth interface, etc.), which are generally used for establishing a communication connection between the electronic device 1 and other electronic devices.
Optionally, the electronic device 1 may further comprise a user interface, which may be a Display (Display), an input unit (such as a Keyboard), and optionally a standard wired interface, a wireless interface. Alternatively, in some embodiments, the display may be an LED display, a liquid crystal display, a touch-sensitive liquid crystal display, an OLED (Organic Light-Emitting Diode) touch device, or the like. The display, which may also be referred to as a display screen or display unit, is suitable for displaying information processed in the electronic device 1 and for displaying a visualized user interface, among other things.
It is to be understood that the described embodiments are for purposes of illustration only and that the scope of the appended claims is not limited to such structures.
Fig. 3 only shows the electronic device 1 with components 12-13, and it will be understood by a person skilled in the art that the structure shown in fig. 3 does not constitute a limitation of the electronic device 1, and may comprise fewer or more components than shown, or a combination of certain components, or a different arrangement of components.
With reference to fig. 1, the memory 12 in the electronic device 1 stores a plurality of instructions to implement a method for data analysis based on location information, and the processor 13 can execute the plurality of instructions to implement:
starting a preset acquisition device, and acquiring geographic position service data of at least one user within a preset time range by using the preset acquisition device;
extracting target geographical location service data of each user from the collected geographical location service data;
transcoding the target geographical location service data of each user to obtain a geographical location service code of each user;
constructing an initial relation graph according to the geographical position service code of each user;
cleaning the initial relation graph to obtain a target relation graph;
calculating the geographic position service behavior similarity between every two users in the target relation graph;
acquiring a user to be predicted, and identifying a user with known occupation in the target relation graph as a tag user;
and detecting a target user from the tag users according to the similarity of geographic position service behaviors between every two users based on an improved tag propagation algorithm, and acquiring the occupation of the target user as the predicted occupation of the user to be predicted.
Specifically, the processor 13 may refer to the description of the relevant steps in the embodiment corresponding to fig. 1 for a specific implementation method of the instruction, which is not described herein again.
In the embodiments provided in the present invention, it should be understood that the disclosed system, apparatus and method may be implemented in other ways. For example, the above-described apparatus embodiments are merely illustrative, and for example, the division of the modules is only one logical functional division, and other divisions may be realized in practice.
The modules described as separate parts may or may not be physically separate, and parts displayed as modules may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment.
In addition, functional modules in the embodiments of the present invention may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, or in a form of hardware plus a software functional module.
It will be evident to those skilled in the art that the invention is not limited to the details of the foregoing illustrative embodiments, and that the present invention may be embodied in other specific forms without departing from the spirit or essential attributes thereof.
The present embodiments are therefore to be considered in all respects as illustrative and not restrictive, the scope of the invention being indicated by the appended claims rather than by the foregoing description, and all changes which come within the meaning and range of equivalency of the claims are therefore intended to be embraced therein. Any reference signs in the claims shall not be construed as limiting the claim concerned.
Furthermore, it is obvious that the word "comprising" does not exclude other elements or steps, and the singular does not exclude the plural. A plurality of units or means recited in the present invention may also be implemented by one unit or means through software or hardware. The terms first, second, etc. are used to denote names, but not any particular order.
Finally, it should be noted that the above embodiments are only for illustrating the technical solutions of the present invention and not for limiting, and although the present invention is described in detail with reference to the preferred embodiments, it should be understood by those skilled in the art that modifications or equivalent substitutions may be made on the technical solutions of the present invention without departing from the spirit and scope of the technical solutions of the present invention.

Claims (8)

1. A method for data analysis based on location information, the method comprising:
starting a preset acquisition device, and acquiring geographic position service data of at least one user within a preset time range by using the preset acquisition device;
extracting target geographical location service data of each user from the collected geographical location service data;
transcoding the target geographical location service data of each user to obtain a geographical location service code of each user;
constructing an initial relation graph according to the geographical position service code of each user;
cleaning the initial relationship diagram to obtain a target relationship diagram, comprising: acquiring a first pre-preset bit character of the geographic position service code of each user in the initial relationship graph to construct a sub-code of each user; acquiring the total number of users in the initial relationship graph and the number of users to which each sub-code belongs; acquiring the coverage rate of each sub-code according to the number of the users to which each sub-code belongs and the total number of people; acquiring the sub-codes with the coverage rate larger than or equal to the configured coverage rate as alternative sub-codes; obtaining the IDF value of each alternative subcode according to the total number of people and the number of people of the user to which each subcode belongs; sequencing each alternative sub-code according to the sequence of the IDF value from high to low, and acquiring the alternative sub-codes arranged at the second front preset position as target sub-codes; reserving the user with the target subcode in the initial relationship graph, and deleting other users to obtain the target relationship graph;
calculating the similarity of geographic position service behaviors between every two users in the target relationship graph, wherein the similarity comprises the following steps: acquiring a geographical position service code of each user in the target relation graph; counting the occurrence frequency of the geographic position service code of each user; sorting the occurrence frequency of the geographical location service codes of each user from high to low; acquiring a geographical location service code arranged in a third front preset position as a first code of each user; calculating the number of times of occurrence of the first code of each user in the target sub-code; constructing a sequence of each user according to the times of the first code of each user appearing in the target sub-code; calculating the cosine distance between every two users according to the sequence of each user as the geographic position service behavior similarity between every two users;
acquiring a user to be predicted, and identifying a user with known occupation in the target relation graph as a tag user;
and detecting a target user from the tag users according to the similarity of geographic position service behaviors between every two users based on an improved tag propagation algorithm, and acquiring the occupation of the target user as the predicted occupation of the user to be predicted, wherein the improved tag propagation algorithm is used for predicting unmarked node tag information according to the marked node tag information and establishing a complete graph model by utilizing the relation between samples.
2. The method of claim 1, wherein the extracting the target geo-location service data for each user from the collected geo-location service data comprises:
screening the geographical location service data of each user from the collected geographical location service data;
calculating the frequency of each geographic position service data in the geographic position service data of the corresponding user;
extracting the geographical location service data with the frequency greater than or equal to the preset frequency from the geographical location service data of each user;
and determining the extracted geographic position service data as the target geographic position service data of each user.
3. The method of claim 1, wherein the constructing an initial relationship graph from the geo-location service codes of each user comprises:
acquiring the number of public character strings contained in the geographic position service code between every two users;
and when detecting that the number of the public character strings contained in the geographic position service code between the two users is greater than or equal to the configuration number, connecting the two detected users to obtain the initial relationship graph.
4. The method of claim 1, wherein the improved tag propagation algorithm based detection of target users from the tagged users based on similarity of geo-location service behavior between each two users comprises:
calculating the weight of each edge in the target relational graph according to the geographic position service behavior similarity between every two users;
constructing a probability transfer matrix according to the weight of each edge;
calculating the propagation probability from each label user to the user to be predicted according to the probability transition matrix;
executing propagation according to the propagation probability and updating the probability distribution of each label user;
and when the probability distribution of each label user is converged, stopping propagation, identifying the highest probability from the probability distribution of each label user obtained after propagation, and determining the label user corresponding to the highest probability as the target user.
5. The method of claim 4, wherein the weight of each edge in the target relationship graph is calculated according to the similarity of geographic location service behavior between each two users by using the following formula:
Figure FDA0003092835100000031
wherein, wijRepresenting the weight of the edge between user i and user j,
Figure FDA0003092835100000032
indicating the configuration parameter, SijRepresenting the similarity of geographic location service behavior between user i and user j.
6. A data analysis apparatus based on location information, the data analysis apparatus based on location information comprising:
the system comprises a collecting unit, a processing unit and a processing unit, wherein the collecting unit is used for starting a preset collecting device and collecting geographic position service data of at least one user in a preset time range by using the preset collecting device;
the extraction unit is used for extracting target geographic position service data of each user from the collected geographic position service data;
the transcoding unit is used for transcoding the target geographical location service data of each user to obtain a geographical location service code of each user;
the building unit is used for building an initial relation graph according to the geographic position service code of each user;
the cleaning unit is used for cleaning the initial relation graph to obtain a target relation graph, and comprises: acquiring a first pre-preset bit character of the geographic position service code of each user in the initial relationship graph to construct a sub-code of each user; acquiring the total number of users in the initial relationship graph and the number of users to which each sub-code belongs; acquiring the coverage rate of each sub-code according to the number of the users to which each sub-code belongs and the total number of people; acquiring the sub-codes with the coverage rate larger than or equal to the configured coverage rate as alternative sub-codes; obtaining the IDF value of each alternative subcode according to the total number of people and the number of people of the user to which each subcode belongs; sequencing each alternative sub-code according to the sequence of the IDF value from high to low, and acquiring the alternative sub-codes arranged at the second front preset position as target sub-codes; reserving the user with the target subcode in the initial relationship graph, and deleting other users to obtain the target relationship graph;
a calculating unit, configured to calculate similarity of geographic location service behaviors between every two users in the target relationship graph, where the calculating unit includes: acquiring a geographical position service code of each user in the target relation graph; counting the occurrence frequency of the geographic position service code of each user; sorting the occurrence frequency of the geographical location service codes of each user from high to low; acquiring a geographical location service code arranged in a third front preset position as a first code of each user; calculating the number of times of occurrence of the first code of each user in the target sub-code; constructing a sequence of each user according to the times of the first code of each user appearing in the target sub-code; calculating the cosine distance between every two users according to the sequence of each user as the geographic position service behavior similarity between every two users;
the identification unit is used for acquiring a user to be predicted and identifying a user with known occupation in the target relation graph as a label user;
and the prediction unit is used for detecting a target user from the tag users according to the similarity of geographic position service behaviors between every two users based on an improved tag propagation algorithm, acquiring the occupation of the target user as the predicted occupation of the user to be predicted, predicting unmarked node tag information according to the marked node tag information by the improved tag propagation algorithm, and establishing a complete graph model by using the relation between samples.
7. An electronic device, characterized in that the electronic device comprises:
a memory storing at least one instruction; and
a processor executing instructions stored in the memory to implement the method of location information based data analysis of any of claims 1 to 5.
8. A computer-readable storage medium characterized by: the computer-readable storage medium has stored therein at least one instruction that is executed by a processor in an electronic device to implement the method for location information based data analysis of any of claims 1 to 5.
CN202110416584.0A 2021-04-19 2021-04-19 Data analysis method, device, equipment and medium based on position information Active CN112819593B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110416584.0A CN112819593B (en) 2021-04-19 2021-04-19 Data analysis method, device, equipment and medium based on position information

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110416584.0A CN112819593B (en) 2021-04-19 2021-04-19 Data analysis method, device, equipment and medium based on position information

Publications (2)

Publication Number Publication Date
CN112819593A CN112819593A (en) 2021-05-18
CN112819593B true CN112819593B (en) 2021-07-06

Family

ID=75863680

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110416584.0A Active CN112819593B (en) 2021-04-19 2021-04-19 Data analysis method, device, equipment and medium based on position information

Country Status (1)

Country Link
CN (1) CN112819593B (en)

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20130166394A1 (en) * 2011-12-22 2013-06-27 Yahoo! Inc. Saliency-based evaluation of webpage designs and layouts
CN107203894B (en) * 2016-03-18 2021-01-01 百度在线网络技术(北京)有限公司 Information pushing method and device
CN107689991B (en) * 2017-08-24 2020-11-20 创新先进技术有限公司 Information pushing method and device and server
CN111191021A (en) * 2018-11-14 2020-05-22 北京嘀嘀无限科技发展有限公司 Occupation prediction method, device, equipment and computer readable storage medium
CN112348662B (en) * 2020-10-21 2023-04-07 上海淇玥信息技术有限公司 Risk assessment method and device based on user occupation prediction and electronic equipment
CN112465565B (en) * 2020-12-11 2023-09-26 加和(北京)信息科技有限公司 User portrait prediction method and device based on machine learning

Also Published As

Publication number Publication date
CN112819593A (en) 2021-05-18

Similar Documents

Publication Publication Date Title
CN112541745B (en) User behavior data analysis method and device, electronic equipment and readable storage medium
WO2022095351A1 (en) Target area division method and apparatus, and electronic device and storage medium
CN112231586A (en) Course recommendation method, device, equipment and medium based on transfer learning
CN111930962A (en) Document data value evaluation method and device, electronic equipment and storage medium
CN112883730B (en) Similar text matching method and device, electronic equipment and storage medium
CN113806434B (en) Big data processing method, device, equipment and medium
CN112380454A (en) Training course recommendation method, device, equipment and medium
CN112050820A (en) Road matching method and device, electronic equipment and readable storage medium
CN111985545B (en) Target data detection method, device, equipment and medium based on artificial intelligence
CN113868529A (en) Knowledge recommendation method and device, electronic equipment and readable storage medium
CN114550076A (en) Method, device and equipment for monitoring area abnormal behaviors and storage medium
CN114612194A (en) Product recommendation method and device, electronic equipment and storage medium
CN113868528A (en) Information recommendation method and device, electronic equipment and readable storage medium
CN115081538A (en) Customer relationship identification method, device, equipment and medium based on machine learning
CN113591459B (en) Address standardization processing method and device, electronic equipment and readable storage medium
CN112069824B (en) Region identification method, device and medium based on context probability and citation
CN114201482A (en) Dynamic population distribution statistical method and device, electronic equipment and readable storage medium
CN112651782A (en) Behavior prediction method, device, equipment and medium based on zoom dot product attention
CN113204698A (en) News subject term generation method, device, equipment and medium
CN111930897A (en) Patent retrieval method, device, electronic equipment and computer-readable storage medium
CN112819593B (en) Data analysis method, device, equipment and medium based on position information
CN111950707A (en) Behavior prediction method, apparatus, device and medium based on behavior co-occurrence network
CN112052409A (en) Address resolution method, device, equipment and medium
CN111652282A (en) Big data based user preference analysis method and device and electronic equipment
CN114708073B (en) Intelligent detection method and device for surrounding mark and serial mark, electronic equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant