CN112488384A - Method, terminal and storage medium for predicting target area based on social media sign-in - Google Patents

Method, terminal and storage medium for predicting target area based on social media sign-in Download PDF

Info

Publication number
CN112488384A
CN112488384A CN202011358914.7A CN202011358914A CN112488384A CN 112488384 A CN112488384 A CN 112488384A CN 202011358914 A CN202011358914 A CN 202011358914A CN 112488384 A CN112488384 A CN 112488384A
Authority
CN
China
Prior art keywords
region
user
social media
frequency data
predicted
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202011358914.7A
Other languages
Chinese (zh)
Other versions
CN112488384B (en
Inventor
史文中
刘哲维
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenzhen Research Institute HKPU
Original Assignee
Shenzhen Research Institute HKPU
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenzhen Research Institute HKPU filed Critical Shenzhen Research Institute HKPU
Priority to CN202011358914.7A priority Critical patent/CN112488384B/en
Publication of CN112488384A publication Critical patent/CN112488384A/en
Application granted granted Critical
Publication of CN112488384B publication Critical patent/CN112488384B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/04Forecasting or optimisation specially adapted for administrative or management purposes, e.g. linear programming or "cutting stock problem"
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning

Abstract

The invention discloses a method, a terminal and a storage medium for predicting a target area based on social media sign-in, wherein a geographical position label in a social media sign-in record is obtained, and an area feature vector is generated according to the geographical position label; generating multi-dimensional region feature vectors according to the region feature vectors of all regions, training a machine learning model according to the multi-dimensional region feature vectors and target region correlation vectors generated based on the multi-dimensional region feature vectors, and taking the trained machine learning model as a prediction model; obtaining a region feature vector to be predicted, sequencing the region to be predicted through the prediction model and the region feature vector to be predicted, and determining a target region in the region to be predicted according to the sequencing result. The invention abstracts the task of determining the user resident area into a sequencing problem, sequences each area visited by the user by using a machine learning model, and finally successfully predicts the resident area of the user.

Description

Method, terminal and storage medium for predicting target area based on social media sign-in
Technical Field
The invention relates to the field of geographic information, in particular to a method, a terminal and a storage medium for predicting a target area based on social media sign-in.
Background
The social media check-in data with the geographic position tags is used for presuming the resident area of the user, and is an important research means in the fields of geographic information science, human mobile mode research and the like. In terms of technical means, a common customer premises presumption method mostly uses a simple statistical method. The method is mostly based on experience intuition of people, lacks of rigorous proof and theoretical basis, and has lower detection result precision.
Thus, there is still a need for improvement and development of the prior art.
Disclosure of Invention
The technical problem to be solved by the present invention is to provide a method, a terminal and a storage medium for predicting a target area based on social media sign-in, aiming at solving the problem that the target area of a user is difficult to be accurately predicted according to social media sign-in data in the prior art.
The technical scheme adopted by the invention for solving the problems is as follows:
in a first aspect, an embodiment of the present invention provides a method for predicting a target area based on social media check-in, where the method includes:
acquiring a geographical position tag in the social media check-in record, and generating a regional characteristic vector according to the geographical position tag;
generating multi-dimensional region feature vectors according to the region feature vectors of all regions, training a machine learning model according to the multi-dimensional region feature vectors and target region correlation vectors generated based on the multi-dimensional region feature vectors, and taking the trained machine learning model as a prediction model;
obtaining a region feature vector to be predicted, sequencing the region to be predicted through the prediction model and the region feature vector to be predicted, and determining a target region in the region to be predicted according to the sequencing result.
In one embodiment, the obtaining a geo-location tag in the social media check-in record, and the generating an area feature vector according to the geo-location tag includes:
obtaining a geographical position tag in the social media check-in record, and classifying the social media check-in record through the geographical position tag;
generating region check-in frequency data according to the classification result;
generating regional active day frequency data according to the classification result;
generating active frequency data of the region in a preset time period according to the classification result;
and generating a region feature vector of each region according to the region check-in frequency data, the region active day frequency data and the active frequency data of the region in a preset time period.
In one embodiment, the generating region check-in frequency data according to the classification result includes:
calculating the number of the social media check-ins published by the user in each region according to the classification result;
acquiring the total number of social media check-ins published by a user according to the geographical position tag;
and taking the ratio of the number of the social media check-ins published by the user in each region to the total number of the social media check-ins published by the user as region check-in frequency data of each region.
In one embodiment, the generating of the frequency of days of area activity data according to the classification result includes:
calculating the number of active days of the user in each area according to the classification result; the number of active days is the number of days that the user at least publishes a social media check-in record;
adding the calculated active days of the user in each area to obtain the total active days of the user;
and taking the ratio of the number of active days of the user in each area to the total number of active days of the user as the frequency data of the number of active days of each area.
In one embodiment, the active frequency data of the region for a preset period of time includes: the active frequency data of the region in the preset time period according to the classification result comprises the following active frequency data at night, active frequency data in summer and active frequency data on weekends:
the active frequency data of the region in a preset time period comprises: night active frequency data, summer active frequency data, and weekend active frequency data; the generating of the active frequency data of the region in the preset time period according to the classification result includes:
calculating the night activity days of the region according to the classification result; the night active days are days when the user is located in the area and has released at least one in a preset time period at night;
performing addition operation on the calculated night activity days of the user in each region to obtain the total night activity days;
taking the ratio of the night activity days of the area to the total night activity days of the area as night activity frequency data of the area;
calculating the number of social media check-ins published by the user in the region between preset months according to the classification result;
acquiring the total number of social media check-ins issued by the user in each region between the preset months according to the geographical position tags;
taking the ratio of the number of social media check-ins posted by the user in the region between preset months to the total number of social media check-ins posted by the user in each region between the preset months as the summer active frequency number;
calculating the number of social media check-ins published by the user in the region on the weekend according to the classification result;
acquiring the total number of social media check-ins issued by the user in each region on weekends according to the geographical position tags;
and taking the ratio of the number of social media check-ins posted by the user in the region on the weekend to the total number of social media check-ins posted by the user in each region on the weekend as the weekend active frequency data.
In one embodiment, the generating multidimensional region feature vectors according to the region feature vectors of all the regions, training a machine learning model according to the multidimensional region feature vectors and a target region correlation vector generated based on the multidimensional region feature vectors, and using the trained machine learning model as a prediction model includes:
acquiring and integrating regional characteristic vectors of all regions, and taking the integrated vector as a multi-dimensional regional characteristic vector;
acquiring a target region correlation vector generated based on the multi-dimensional region feature vector;
taking the multi-dimensional region feature vector as input data of a machine learning model, taking the target region correlation vector generated based on the multi-dimensional region feature vector as output data of the machine learning model, and training the machine learning model;
and taking the trained machine learning model as a prediction model.
In one embodiment, the obtaining the target region correlation vector generated based on the multi-dimensional region feature vector comprises:
according to the relevance between each region in the multi-dimensional region feature vector and a target region, scoring each region to obtain the relevance value of the target region of each region;
and integrating the target region relevance scores of all the regions, and taking the vector obtained by integration as a target region relevance vector.
In one embodiment, the obtaining the feature vector of the region to be predicted, sorting the region to be predicted by the prediction model and the feature vector of the region to be predicted, and determining the target region in the region to be predicted according to the sorting result includes:
acquiring a region feature vector to be predicted, and inputting the region feature vector to be predicted into the prediction model;
obtaining a score which is output by the prediction model and generated based on the regional feature vector to be predicted;
and sequencing the regions to be predicted based on the scores, and determining a target region in the regions to be predicted according to a sequencing result.
In a second aspect, an embodiment of the present invention further provides a mobile terminal, where the mobile terminal includes: a processor, a storage medium communicatively coupled to the processor, the storage medium adapted to store a plurality of instructions; the processor is adapted to invoke instructions in the storage medium to consistently implement the steps of any of the above methods for predicting a target area based on social media check-in.
In a second aspect, the present invention further provides a computer-readable storage medium, where the instructions are adapted to be loaded and executed by a processor to implement the steps of any one of the above methods for predicting a target area based on social media check-in.
The invention has the beneficial effects that: according to the embodiment of the invention, the geographical position tags in the social media sign-in record are obtained, and the regional characteristic vectors are generated according to the geographical position tags; generating multi-dimensional region feature vectors according to the region feature vectors of all regions, training a machine learning model according to the multi-dimensional region feature vectors and target region correlation vectors generated based on the multi-dimensional region feature vectors, and taking the trained machine learning model as a prediction model; obtaining a feature vector of a region to be predicted, sequencing the region to be predicted through the prediction model, and determining a target region in the region to be predicted according to the sequencing result. The invention abstracts the task of determining the user resident area into a sequencing problem, sequences each area visited by the user by using a machine learning model, and finally successfully predicts the resident area of the user.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments described in the present invention, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.
Fig. 1 is a flowchart illustrating a method for predicting a target area based on social media check-in according to an embodiment of the present invention.
Fig. 2 is a schematic flow chart of generating a region feature vector according to an embodiment of the present invention.
Fig. 3 is a schematic flowchart of obtaining a prediction model according to an embodiment of the present invention.
Fig. 4 is a schematic flowchart of determining a target area according to an embodiment of the present invention.
Fig. 5 is a schematic block diagram of a terminal according to an embodiment of the present invention.
Fig. 6 is a graph of experimental results provided by an embodiment of the present invention for evaluating the technical effects of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention clearer and clearer, the present invention is further described in detail below with reference to the accompanying drawings and examples. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.
It should be noted that, if directional indications (such as up, down, left, right, front, and back … …) are involved in the embodiment of the present invention, the directional indications are only used to explain the relative positional relationship between the components, the movement situation, and the like in a specific posture (as shown in the drawing), and if the specific posture is changed, the directional indications are changed accordingly.
In recent years, with the popularization of mobile positioning equipment and the rise of location-based services, a novel online social media, namely a social network based on geographic location, is generated by the fusion of the traditional social network and the positioning technology, and the users are supported to share their own location information anytime and anywhere. Typical behavior of users of this type of application is to make social media check-ins or comment on check-in places, etc. The personal check-in data may represent a historical movement trajectory of the person, and the check-in data of a large number of users may reveal human movement patterns and life laws. Because the check-in data is the social network data with the geographic information, the check-in data can reflect the social network behavior of the user and the movement behavior of the user. Meanwhile, due to the fact that the acquisition mode is simple and the cost is low, more and more students adopt check-in data for research in recent years.
One such study involves inferring the user's residence area using social media check-in data with geo-location tags. The method is an important research means in the fields of geographic information science, human movement pattern research and the like. In terms of technical means, a common customer premises presumption method mostly uses a simple statistical method. The method is mostly based on experience intuition of people, lacks of rigorous proof and theoretical basis, and has lower detection result precision.
Based on the above-mentioned drawbacks of the prior art, the present invention provides a method for determining a target area to accurately determine a resident area of a user. According to the method, tasks for determining the user's resident area are abstracted into a sequencing problem, and the machine learning model is used for sequencing each area visited by the user, so that the resident area of the user is successfully predicted finally.
As shown in fig. 1, the method comprises the steps of:
s100, obtaining a social media check-in record of a user, and generating a region feature vector according to the social media check-in record.
The social media check-in record of the user incorporates an element of the user's current geographic location. Inside a user's social media check-in record, the user may check-in to a place, disclose their geographic location, and leave comment information at the check-in place. When the social media is combined with the position, the time and the place of the user activity can be analyzed, and the activity rule of the user in each area can be better known. Therefore, social media check-in records of the user need to be obtained first, and region feature vectors of all regions are generated according to the social media check-in records.
As shown in fig. 2, in an implementation manner, the step S100 specifically includes the following steps:
step S110, obtaining geographic position tags in the social media check-in records, and classifying the social media check-in records through the geographic position tags;
step S120, generating region check-in frequency data according to the classification result;
s130, generating regional activity day frequency data according to the classification result;
step S140, generating active frequency data of the region in a preset time period according to the classification result;
and S150, generating a region feature vector of each region according to the region check-in frequency data, the region active day frequency data and the active frequency data of the region in a preset time period.
After social media check-in records of a user are obtained, in order to analyze the activity condition of the user in each area based on the social media check-in records, the social media check-in records need to be classified by taking the area as a unit according to geographical position tags in the social media check-in records, area check-in frequency data, area active day frequency data and active frequency data of the area in a preset time period are generated according to classification results, and finally a feature vector of the area is formed based on the frequency data. In an implementation manner, the number of the social media check-ins posted by the user in each region may be calculated according to the classification result, then, the total number of the social media check-ins posted by the user is obtained according to the social media check-in record of the user, and finally, the ratio of the number of the social media check-ins posted by the user in each region to the total number of the social media check-ins posted by the user is used as the region check-in frequency data of each region.
In addition, the number of active days of the user in each area can be calculated according to the classification result, and then the calculated number of active days of the user in each area is added to obtain the total number of active days of the user; and taking the ratio of the number of active days of the user in each area to the total number of active days of the user as the frequency data of the number of active days of each area.
And finally, taking the ratio of the number of the social media check-ins posted by the user in each region in the preset time period to the total number of the social media check-ins posted by the user in the preset time period as the active frequency data of each region in the preset time period.
In one implementation, the active frequency data of the region for a preset time period includes: night activity frequency data, summer activity frequency data, and weekend activity frequency data. In order to obtain these three types of frequency data, this embodiment needs to calculate night activity days of the area according to the classification result, where the night activity days are days for which the user has published at least one time in the area in a preset time period at night. And then, carrying out addition operation on the calculated night activity days of the user in each region to obtain the total night activity days. And finally, taking the ratio of the night activity days of the region to the total night activity days of the region as the night activity frequency data of the region.
In addition, the number of social media check-ins posted by the user in the region between the preset months needs to be calculated according to the classification result. And then, acquiring the total number of social media check-ins published by the user in each region between the preset months according to the geographical position tags. And then, taking the ratio of the number of social media check-ins posted by the user in the region between the preset months to the total number of social media check-ins posted by the user in each region between the preset months as the summer active frequency number.
In addition, the number of social media check-ins posted by the user in the region on the weekend needs to be calculated according to the classification result. And then, acquiring the total number of social media check-ins published by the user in each region on the weekend according to the geographical position tag. And finally, taking the ratio of the number of social media check-ins posted by the user in the region on the weekend to the total number of social media check-ins posted by the user in each region on the weekend as the weekend active frequency data.
In one implementation, the user's resident area is reflected more in social media check-ins at night than in daytime, given relevant research; similarly, the user signs in the social media in 5-9 months in summer, and can reflect the resident area of the user better than in winter; the user checks in on the social media on weekends, and the resident area of the user can be reflected better than that of a working day. Therefore, the present embodiment is based on the above-mentioned related studies, and can set the night preset time period to 19 pm to 7 am, between the preset months to 5 months to 9 months, and the weekend is saturday to sunday in the conventional sense.
For example, for a user uiThe geographical location tag in its social media record shows that it is in a certain area rjWhen the feature vector ur of the region appears, the feature vector ur of the regionj i=(rtp,rtad,rtnp,rtan,rts,rtw). Wherein rt ispFor user uiIn the region rjThe published social media check-in number accounts for the proportion of all social media check-in numbers; rt is an integer ofadFor user uiIn the region rjIs defined as the user posted at least one social media check-in on that day. Then rt isanFor user uiIn the region rjIs defined as the user issues at least one social media sign between 19 pm and seven am of the dayTo; rt is an integer ofsFor user uiIn the region rjThe social media check-in number in the summer from 5 months to 9 months accounts for the proportion of the social media check-in number in all the summer; rt is an integer ofwFor user uiIn the region rjThe number of social media on weekends of (a) is a proportion of the number of social media check-ins on all of its weekends.
To implement the training of the machine learning model, as shown in fig. 1, the method further includes the following steps:
step S200, generating multi-dimensional region feature vectors according to the region feature vectors of all regions, training a machine learning model according to the multi-dimensional region feature vectors and target region correlation vectors generated based on the multi-dimensional region feature vectors, and taking the trained machine learning model as a prediction model.
Specifically, the embodiment uses a machine learning model of supervised learning type, that is, a training process of the machine learning model is changed into a learning task, and the machine learning model learns how to predict the output variables from the input variables by establishing a mathematical relationship between the input variables and the output variables. Therefore, it is necessary to first obtain a multi-dimensional region feature vector as input data and a target region correlation vector as output data, and then train a machine learning model according to the two vectors. The trained machine learning model can be used as a prediction model, for example, for predicting the resident area of the user.
As shown in fig. 3, in an implementation manner, the step S200 specifically includes the following steps:
step S210, obtaining and integrating the regional characteristic vectors of all the regions, and taking the integrated vector as a multi-dimensional regional characteristic vector;
step S220, obtaining a target area correlation vector generated based on the multi-dimensional area feature vector;
step S230, taking the multidimensional region feature vector as input data of a machine learning model, taking the target region correlation vector generated based on the multidimensional region feature vector as output data of the machine learning model, and training the machine learning model;
and step S240, taking the trained machine learning model as a prediction model.
Specifically, in this embodiment, the region feature vectors of all the regions are obtained and integrated to obtain the multidimensional region feature vector. And then generating a target region correlation vector according to the multi-dimensional region feature vector. In order to generate a target region relevance vector, in an implementation manner, in this embodiment, each region is scored according to the relevance between each region and a target region in the multi-dimensional region feature vector, so as to obtain a target region relevance score of each region, then the target region relevance scores of each region are integrated, and the integrated vector is used as the target region relevance vector. The target region relevance score for each region at this step may correctly indicate how close each region is relevant to the target region. After the multi-dimensional region feature vector and the target region correlation vector are obtained, the multi-dimensional region feature vector is used as input data of a machine learning model, the target region correlation vector is used as output data of the machine learning model, the machine learning model is trained, and finally the trained machine learning model is used as a prediction model.
For example, when the target area to be predicted is the user's resident area, the present embodiment needs to collect the user's resident area in advance. Specifically, the personal homepage information of the social media of the user can be crawled through a web crawler, or the current city of the user is filled in the personal information by the user in the social media such as a microblog, a twitter and a photo wall, and the resident area of the user is determined according to the information. For all the areas visited by the user, if the area is a resident area of the user, the relevance score is 1; otherwise the region relevance score is 0. The specific method is that all the areas (r) visited by the user1,r2,…,rm) If the area is the resident area of the user, the relevance score is 1; otherwise the region relevance is scoredThe value is 0, forming an m-dimensional vector (0,0, …,1, …,0) until the regional relevance score for each region is calculated.
In one implementation, the multiple decision tree lambdamat model is used as the prediction model in this embodiment, since the multiple decision tree lambdamat model is very effective for the search ranking algorithm for building the "ranking learning" framework. LambdaMART is a Listwise LTR algorithm, and converts a search engine result sorting problem into a Regression decision tree problem based on a LambdaRank algorithm and a MART (multiple Additive Regression Tree) algorithm. MART is actually a Gradient Boosting Decision Tree (GBDT) algorithm. The core idea of the GBDT is that in continuous iteration, a regression decision tree model generated in a new iteration is fitted with the gradient of a loss function, and finally all regression decision trees are superposed to obtain a final model. In the prior art, Lambdamart is a very mature model, and the whole training process is very streamlined. When the model is trained, the training process of the model can be realized only by constructing input data and output data of the model as training data.
After acquiring the input data and the output data for training the machine learning model, in order to predict the target area of the user, as shown in fig. 1, the method further includes the following steps:
step S300, obtaining the characteristic vector of the region to be predicted, sequencing the region to be predicted through the prediction model and the characteristic vector of the region to be predicted, and determining a target region in the region to be predicted according to the sequencing result.
The predictive model is trained, so that it can automatically predict correct output data from the output data. In order to predict the target area of the user, firstly, area feature vectors of all areas to be predicted are generated according to the steps, then the area feature vectors to be predicted are input into the prediction model, the areas to be predicted are sequenced through the prediction model, the correlation between each area in the areas to be predicted and the target area is determined according to the sequencing result, and then the target area of the user to be predicted is determined in the areas to be predicted.
As shown in fig. 4, in an implementation manner, the step S300 specifically includes the following steps:
step S310, obtaining a region feature vector to be predicted, and inputting the region feature vector to be predicted into the prediction model;
step S320, obtaining a score which is output by the prediction model and generated based on the regional feature vector to be predicted;
and S330, sequencing the regions to be predicted based on the scores, and determining a target region in the regions to be predicted according to a sequencing result.
Specifically, the regional feature vectors of the regions to be predicted are input into the prediction model, the regions to be predicted are scored according to the prediction model, the regions to be predicted are ranked based on scoring results, and the predicted target regions are determined in the regions to be predicted according to the ranking results. In one implementation, each of the regions to be predicted may be sorted from large to small according to the magnitude of the target region relevance score, and when the target region to be predicted is a user's resident region, the region located at the first ordinal position may be used as the user's resident region.
In order to illustrate the effect of the method for predicting the target area based on social media check-in provided by the embodiment of the invention, the embodiment of the invention adopts real data to carry out experiments. FIG. 6 is an experimental result of the present invention using real photo wall social media check-in data. In this example, indexes such as Accuracy, F-measure, and Balanced Accuracy are used to perform quantitative evaluation on the method of the present invention and other comparison methods. The quantitative result shows that the method can obtain larger Accuracy, F-measure and Balanced Accuracy, proves the superiority of the method compared with other methods, and can more accurately predict the resident area of the social media user compared with other prediction methods when the target area to be predicted is the resident area of the user.
Based on the above embodiment, the present invention further provides an intelligent terminal, and a schematic block diagram thereof may be as shown in fig. 5. The intelligent terminal comprises a processor, a memory, a network interface and a display screen which are connected through a system bus. Wherein, the processor of the intelligent terminal is used for providing calculation and control capability. The memory of the intelligent terminal comprises a nonvolatile storage medium and an internal memory. The non-volatile storage medium stores an operating system and a computer program. The internal memory provides an environment for the operation of an operating system and computer programs in the non-volatile storage medium. The network interface of the intelligent terminal is used for being connected and communicated with an external terminal through a network. The computer program, when executed by a processor, implements a method of predicting a target area based on social media check-ins. The display screen of the intelligent terminal can be a liquid crystal display screen or an electronic ink display screen.
It will be understood by those skilled in the art that the block diagram shown in fig. 5 is only a block diagram of a part of the structure related to the solution of the present invention, and does not constitute a limitation to the intelligent terminal to which the solution of the present invention is applied, and a specific intelligent terminal may include more or less components than those shown in the figure, or combine some components, or have a different arrangement of components.
In one implementation, one or more programs are stored in a memory of the smart terminal and configured to be executed by one or more processors include instructions for performing a method of predicting a target area based on social media check-in.
It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by hardware instructions of a computer program, which can be stored in a non-volatile computer-readable storage medium, and when executed, can include the processes of the embodiments of the methods described above. Any reference to memory, storage, databases, or other media used in embodiments provided herein may include non-volatile and/or volatile memory. Non-volatile memory can include read-only memory (ROM), Programmable ROM (PROM), Electrically Programmable ROM (EPROM), Electrically Erasable Programmable ROM (EEPROM), or flash memory. Volatile memory can include Random Access Memory (RAM) or external cache memory. By way of illustration and not limitation, RAM is available in a variety of forms such as Static RAM (SRAM), Dynamic RAM (DRAM), Synchronous DRAM (SDRAM), Double Data Rate SDRAM (DDRSDRAM), Enhanced SDRAM (ESDRAM), Synchronous Link DRAM (SLDRAM), Rambus Direct RAM (RDRAM), direct bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM).
In summary, the present invention discloses a method for predicting a target area based on social media sign-in, which is characterized in that the method comprises: acquiring a geographical position tag in the social media check-in record, and generating a regional characteristic vector according to the geographical position tag; generating multi-dimensional region feature vectors according to the region feature vectors of all regions, training a machine learning model according to the multi-dimensional region feature vectors and target region correlation vectors generated based on the multi-dimensional region feature vectors, and taking the trained machine learning model as a prediction model; obtaining a feature vector of a region to be predicted, sequencing the region to be predicted through the prediction model, and determining a target region in the region to be predicted according to the sequencing result. The invention abstracts the task of determining the user resident area into a sequencing problem, and sequences each area visited by the user by using a machine learning model, thereby finally successfully predicting the resident area of the user.
It is to be understood that the invention is not limited to the examples described above, but that modifications and variations may be effected thereto by those of ordinary skill in the art in light of the foregoing description, and that all such modifications and variations are intended to be within the scope of the invention as defined by the appended claims.

Claims (10)

1. A method for predicting a target area based on social media check-in, the method comprising:
acquiring a geographical position tag in the social media check-in record, and generating a regional characteristic vector according to the geographical position tag;
generating multi-dimensional region feature vectors according to the region feature vectors of all regions, training a machine learning model according to the multi-dimensional region feature vectors and target region correlation vectors generated based on the multi-dimensional region feature vectors, and taking the trained machine learning model as a prediction model;
obtaining a region feature vector to be predicted, sequencing the region to be predicted through the prediction model and the region feature vector to be predicted, and determining a target region in the region to be predicted according to the sequencing result.
2. The method of claim 1, wherein the obtaining a geo-location tag in the social media check-in record and the generating a region feature vector according to the geo-location tag comprises:
obtaining a geographical position tag in the social media check-in record, and classifying the social media check-in record through the geographical position tag;
generating region check-in frequency data according to the classification result;
generating regional active day frequency data according to the classification result;
generating active frequency data of the region in a preset time period according to the classification result;
and generating a region feature vector of each region according to the region check-in frequency data, the region active day frequency data and the active frequency data of the region in a preset time period.
3. The method of predicting target areas based on social media check-in of claim 2, wherein the generating area check-in frequency data according to classification results comprises:
calculating the number of the social media check-ins published by the user in each region according to the classification result;
acquiring the total number of social media check-ins published by a user according to the geographical position tag;
and taking the ratio of the number of the social media check-ins published by the user in each region to the total number of the social media check-ins published by the user as region check-in frequency data of each region.
4. The method of predicting target areas based on social media check-in of claim 2, wherein the generating frequency of days of area activity data from the classification result comprises:
calculating the number of active days of the user in each area according to the classification result; the number of active days is the number of days that the user at least publishes a social media check-in record;
adding the calculated active days of the user in each area to obtain the total active days of the user;
and taking the ratio of the number of active days of the user in each area to the total number of active days of the user as the frequency data of the number of active days of each area.
5. The method of claim 2, wherein the activity frequency data of the region for a preset period of time comprises: the active frequency data of the region in the preset time period according to the classification result comprises the following active frequency data at night, active frequency data in summer and active frequency data on weekends:
the active frequency data of the region in a preset time period comprises: night active frequency data, summer active frequency data, and weekend active frequency data; the generating of the active frequency data of the region in the preset time period according to the classification result includes:
calculating the night activity days of the region according to the classification result; the night active days are days when the user is located in the area and has released at least one in a preset time period at night;
performing addition operation on the calculated night activity days of the user in each region to obtain the total night activity days;
taking the ratio of the night activity days of the area to the total night activity days of the area as night activity frequency data of the area;
calculating the number of social media check-ins published by the user in the region between preset months according to the classification result;
acquiring the total number of social media check-ins issued by the user in each region between the preset months according to the geographical position tags;
taking the ratio of the number of social media check-ins posted by the user in the region between preset months to the total number of social media check-ins posted by the user in each region between the preset months as the summer active frequency number;
calculating the number of social media check-ins published by the user in the region on the weekend according to the classification result;
acquiring the total number of social media check-ins issued by the user in each region on weekends according to the geographical position tags;
and taking the ratio of the number of social media check-ins posted by the user in the region on the weekend to the total number of social media check-ins posted by the user in each region on the weekend as the weekend active frequency data.
6. The method of claim 1, wherein the generating multidimensional region feature vectors according to the region feature vectors of all regions, training a machine learning model according to the multidimensional region feature vectors and a target region correlation vector generated based on the multidimensional region feature vectors, and using the trained machine learning model as a prediction model comprises:
acquiring and integrating regional characteristic vectors of all regions, and taking the integrated vector as a multi-dimensional regional characteristic vector;
acquiring a target region correlation vector generated based on the multi-dimensional region feature vector;
taking the multi-dimensional region feature vector as input data of a machine learning model, taking the target region correlation vector generated based on the multi-dimensional region feature vector as output data of the machine learning model, and training the machine learning model;
and taking the trained machine learning model as a prediction model.
7. The method of claim 6, wherein obtaining a target region relevance vector generated based on the multi-dimensional region feature vector comprises:
according to the relevance between each region in the multi-dimensional region feature vector and a target region, scoring each region to obtain the relevance value of the target region of each region;
and integrating the target region relevance scores of all the regions, and taking the vector obtained by integration as a target region relevance vector.
8. The method of claim 1, wherein the obtaining of the feature vector of the region to be predicted, the ranking of the region to be predicted according to the prediction model and the feature vector of the region to be predicted, and the determining of the target region in the region to be predicted according to the ranking result comprises:
acquiring a region feature vector to be predicted, and inputting the region feature vector to be predicted into the prediction model;
obtaining a score which is output by the prediction model and generated based on the regional feature vector to be predicted;
and sequencing the regions to be predicted based on the scores, and determining a target region in the regions to be predicted according to a sequencing result.
9. A mobile terminal, comprising: a processor, a storage medium communicatively coupled to the processor, the storage medium adapted to store a plurality of instructions; the processor is adapted to invoke instructions in the storage medium to consistently implement the steps of the method of any of claims 1-8 above for predicting a target area based on social media check-in.
10. A computer-readable storage medium having stored thereon instructions adapted to be loaded and executed by a processor to perform the steps of the method for predicting a target area based on social media check-in as claimed in any one of claims 1 to 8.
CN202011358914.7A 2020-11-27 2020-11-27 Method, terminal and storage medium for predicting target area based on social media sign-in Active CN112488384B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011358914.7A CN112488384B (en) 2020-11-27 2020-11-27 Method, terminal and storage medium for predicting target area based on social media sign-in

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011358914.7A CN112488384B (en) 2020-11-27 2020-11-27 Method, terminal and storage medium for predicting target area based on social media sign-in

Publications (2)

Publication Number Publication Date
CN112488384A true CN112488384A (en) 2021-03-12
CN112488384B CN112488384B (en) 2021-08-31

Family

ID=74936316

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011358914.7A Active CN112488384B (en) 2020-11-27 2020-11-27 Method, terminal and storage medium for predicting target area based on social media sign-in

Country Status (1)

Country Link
CN (1) CN112488384B (en)

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104680250A (en) * 2015-02-11 2015-06-03 北京邮电大学 Position predicting system
CN104750829A (en) * 2015-04-01 2015-07-01 华中科技大学 User position classifying method and system based on signing in features
CN105740401A (en) * 2016-01-28 2016-07-06 北京理工大学 Individual behavior and group interest-based interest place recommendation method and device
CN107194011A (en) * 2017-06-23 2017-09-22 重庆邮电大学 A kind of position prediction system and method based on social networks
CN110298687A (en) * 2019-05-23 2019-10-01 香港理工大学深圳研究院 A kind of region attraction appraisal procedure and equipment
CN110334293A (en) * 2019-07-12 2019-10-15 吉林大学 A kind of facing position social networks has Time Perception position recommended method based on fuzzy clustering
CN110570044A (en) * 2019-09-16 2019-12-13 重庆大学 next-place prediction method based on recurrent neural network and attention mechanism
CN111126653A (en) * 2018-11-01 2020-05-08 百度在线网络技术(北京)有限公司 User position prediction method, device and storage medium

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104680250A (en) * 2015-02-11 2015-06-03 北京邮电大学 Position predicting system
CN104750829A (en) * 2015-04-01 2015-07-01 华中科技大学 User position classifying method and system based on signing in features
CN105740401A (en) * 2016-01-28 2016-07-06 北京理工大学 Individual behavior and group interest-based interest place recommendation method and device
CN107194011A (en) * 2017-06-23 2017-09-22 重庆邮电大学 A kind of position prediction system and method based on social networks
CN111126653A (en) * 2018-11-01 2020-05-08 百度在线网络技术(北京)有限公司 User position prediction method, device and storage medium
CN110298687A (en) * 2019-05-23 2019-10-01 香港理工大学深圳研究院 A kind of region attraction appraisal procedure and equipment
CN110334293A (en) * 2019-07-12 2019-10-15 吉林大学 A kind of facing position social networks has Time Perception position recommended method based on fuzzy clustering
CN110570044A (en) * 2019-09-16 2019-12-13 重庆大学 next-place prediction method based on recurrent neural network and attention mechanism

Also Published As

Publication number Publication date
CN112488384B (en) 2021-08-31

Similar Documents

Publication Publication Date Title
US10599623B2 (en) Matching multidimensional projections of functional space
US11151096B2 (en) Dynamic syntactic affinity group formation in a high-dimensional functional information system
US20210248461A1 (en) Graph enhanced attention network for explainable poi recommendation
Ren et al. Deep spatio-temporal residual neural networks for road-network-based data modeling
CN111723292B (en) Recommendation method, system, electronic equipment and storage medium based on graph neural network
CN109766454A (en) A kind of investor's classification method, device, equipment and medium
CN105447185A (en) Knowledge and position based individualized scenic spots recommendation method
CN112131261B (en) Community query method and device based on community network and computer equipment
CN111259167B (en) User request risk identification method and device
Zhang et al. An improved probabilistic relaxation method for matching multi-scale road networks
Liu et al. Pair-wise ranking based preference learning for points-of-interest recommendation
CN112818262A (en) Map POI searching method, system, device and medium based on user data
CN116244513A (en) Random group POI recommendation method, system, equipment and storage medium
CN109684561B (en) Interest point recommendation method based on deep semantic analysis of user sign-in behavior change
Ghahramani et al. Spatiotemporal Analysis of mobile phone network based on self-organizing feature map
Quan et al. An optimized task assignment framework based on crowdsourcing knowledge graph and prediction
CN112488384B (en) Method, terminal and storage medium for predicting target area based on social media sign-in
CN113785317A (en) Feedback mining using domain-specific modeling
CN116503588A (en) POI recommendation method, device and equipment based on multi-element relation space-time network
Liao et al. An integrated model based on deep multimodal and rank learning for point-of-interest recommendation
US20210231449A1 (en) Deep User Modeling by Behavior
CN115455276A (en) Method and device for recommending object, computer equipment and storage medium
Pan et al. A data mining approach to the analysis of a catering lean service project
CN110457420B (en) Point-of-interest point identification method, device, equipment and storage medium
Hussan et al. An optimized user behavior prediction model using genetic algorithm on mobile web structure

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant