CN104615715A - Social network event analyzing method and system based on geographic positions - Google Patents

Social network event analyzing method and system based on geographic positions Download PDF

Info

Publication number
CN104615715A
CN104615715A CN201510061722.2A CN201510061722A CN104615715A CN 104615715 A CN104615715 A CN 104615715A CN 201510061722 A CN201510061722 A CN 201510061722A CN 104615715 A CN104615715 A CN 104615715A
Authority
CN
China
Prior art keywords
social network
network data
geographic position
data text
word
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201510061722.2A
Other languages
Chinese (zh)
Inventor
李建欣
吴博
张日崇
于伟仁
胡春明
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beihang University
Original Assignee
Beihang University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beihang University filed Critical Beihang University
Priority to CN201510061722.2A priority Critical patent/CN104615715A/en
Publication of CN104615715A publication Critical patent/CN104615715A/en
Pending legal-status Critical Current

Links

Abstract

The invention provides a social network event analyzing method and system based on geographic positions. The method comprises the steps of conducting participle processing on each social network data text to obtain words of the social network data text; building a mapping relation between the social network data texts and the geographic positions corresponding to the social network data texts, wherein the geographic positions corresponding to the social network data texts are words, related to the geographic positions, in the words of the social network data texts; determining the social network data text corresponding to each preset target geographic position; as for each target geographic position, conducting weight calculation on the words of the social network data text corresponding to the target geographic position, and obtaining keywords of the social network data text corresponding to the target geographic position to be used as hot events of the target geographic position to be pushed. According to the scheme, a user can be helped to visually obtain the hot events related to the geographic positions.

Description

Based on social networks affair analytical method and the system in geographic position
Technical field
The present invention relates to Data Mining, particularly relate to a kind of social networks affair analytical method based on geographic position and system.
Background technology
The socialization characteristic of social networks and propagating rapidly, timely, attract a large amount of user information real-time being had to high demand, make in the world everyone can both become information source, and make it to propagate in the whole world, large increase that what this just made social networks event itself carry contain much information.Social networks the event sets news of magnanimity, event and information, and every day is all in renewal, and every day all spreading, and produces tremendous influence to the society of reality.Especially on the Information Communication of accident, surmount traditional media especially, become the channel of information fast propagation.Information on social networks is not only issued in time, and is the epitome of society life, and the information excavated in social networks event is conducive to the situation analyzing real world from different perspectives.
Along with the explosion of mobile Internet, the equipment with positioning function is also more and more universal, and user can get more accurate geographical location information easily, and this makes increasing data with geographical attribute.Meanwhile, in the field application such as city planning, tourist industry, safety, also more and more vigorous to the analysis demand of this kind of data with geographical location information.
Being that the social networks of representative has become the fastest internet, applications of Chinese development with microblogging, is the platform of an Information Sharing based on customer relationship, propagation and obtaining information.Can mark geographic position while releasing news at present, but these geographic position are relatively isolated, namely only and this microblogging exist and contact.Although contact by comment, forwarding, good friend between the micro-blog information of magnanimity, it cannot contact in real spatial dimension, lacks geo-location service (LocationBased Service is called for short LBS) correlative factor.
Summary of the invention
The invention provides a kind of social networks affair analytical method based on geographic position and system, for the social networks event of being correlated with based on geolocation analysis.
First aspect of the present invention is to provide a kind of social networks affair analytical method based on geographic position, comprising:
Word segmentation processing is carried out to each social network data text, obtains the word of described social network data text;
Set up the mapping relations of geographic position corresponding to described social network data text and described social network data text, the geographic position that described social network data text is corresponding is in the word of described social network data text, the word relevant to geographic position;
The geographic position corresponding according to each social network data text, determines the social network data text that each target geographic position of presetting is corresponding;
For each target geographic position, weight calculation is carried out to the word of social network data text corresponding to described target geographic position, obtain and by the keyword of social network data text corresponding for described target geographic position, the hot ticket as described target geographic position pushes.
Another aspect of the present invention is to provide a kind of social networks event analysis system based on geographic position, comprising:
Word-dividing mode, for carrying out word segmentation processing to each social network data text, obtains the word of described social network data text;
Geographic position acquisition module, for setting up the mapping relations in geographic position corresponding to described social network data text and described social network data text, the geographic position that described social network data text is corresponding is in the word of described social network data text, the word relevant to geographic position;
Geolocation analysis module, for the geographic position corresponding according to each social network data text, determines the social network data text that each target geographic position of presetting is corresponding;
Event analysis module, for for each target geographic position, weight calculation is carried out to the word of social network data text corresponding to described target geographic position, obtain and by the keyword of social network data text corresponding for described target geographic position, the hot ticket as described target geographic position pushes.
Social networks affair analytical method based on geographic position provided by the invention and system, by studying social network data text, analyze the social network data text with geographic location association, and by the keyword of social network data text corresponding for each geographic position, hot ticket as this geographic position pushes, and user can be helped to get the relevant hot ticket in geographic position intuitively.
Accompanying drawing explanation
The schematic flow sheet of the social networks affair analytical method based on geographic position that Fig. 1 provides for the embodiment of the present invention one;
The structural representation of the social networks event analysis system based on geographic position that Fig. 2 provides for the embodiment of the present invention two.
Embodiment
For making the object of the embodiment of the present invention, technical scheme and advantage clearly, below in conjunction with the accompanying drawing in the embodiment of the present invention, the technical scheme in the embodiment of the present invention is clearly and completely described.
The schematic flow sheet of the social networks affair analytical method based on geographic position that Fig. 1 provides for the embodiment of the present invention one, as shown in Figure 1, described method comprises:
101, word segmentation processing is carried out to each social network data text, obtain the word of described social network data text.
In practical application, can first obtain in certain hour section from large Data Analysis Platform, the social network data text of some, such as, microblogging, then corresponding, before 101, described method can also comprise: the described social network data text obtaining the predetermined number issued in preset time period.
Concrete, the described social network data text in the present embodiment can derive from large Data Analysis Platform ElasticSearch search engine, and for microblogging, all microbloggings all can be original microblogging, does not comprise the microblogging that user forwards.
Accordingly, get in certain hour section, after the social network data text of some, need to carry out pre-service to these social network data texts, pre-service then mainly carries out participle to social network data text.Still for microblogging, then need to carry out word segmentation processing to microblogging text message, concrete, text message here does not comprise the information such as the picture of user's issue.
In practical application, described word segmentation processing can be realized by numerous embodiments, such as, segmenter can be utilized to carry out word segmentation processing to social network data text.Optionally, 101 specifically can comprise: utilize IKAnalyzer segmenter, carry out word segmentation processing to described social network data text.Concrete, to carry out participle to microblogging text, first segmenter loads dictionary, analyze microblogging text, intercept a token, search keyword adopts from most major term to the iterative searching mode cutting layer by layer of minimum word, retrieves maximum fractionation word in this search word in dictionary, proceeds the cutting of iterative searching mode by that analogy until terminate.
102, set up the mapping relations of geographic position corresponding to described social network data text and described social network data text, the geographic position that described social network data text is corresponding is in the word of described social network data text, the word relevant to geographic position.
In practical application, by from each word of social network data text, the word relevant to geographic position can be filtered out, obtain the geographic position that described social network data text is corresponding.Concrete, three of search dog grades of administrative division dictionaries of place name can be utilized to screen, accordingly, for microblogging, if the geographical location information had in above-mentioned dictionary detected in microblogging text, just in conjunction with the context of this microblogging, extract this geographical location information.Accordingly, after obtaining geographic position corresponding to described social network data text, word association relevant to geographic position with these for described social network data text is got up.Concrete, the geographical text analyzing method of named entity recognition can be adopted, utilize three grades of administrative division dictionaries of place name of search dog, if having the geographical location information in administrative division dictionary of place name in microblogging, just extract this geographical location information in conjunction with the context of microblogging, and associate with this microblogging.
Optionally, in order to save process resource, improving treatment effeciency, for not comprising the social network data text with the word in geographic position, then determining this social network data text not containing geographical location information, accordingly, can be abandoned and not deal with.
103, corresponding according to each social network data text geographic position, determines the social network data text that each target geographic position of presetting is corresponding.
In practical application, according to the mapping relations in each social network data text and geographic position, the social network data text that each target geographic position is corresponding can be determined.
Concrete, described each target geographic position can be determined according to actual needs, such as, can with each geographic position in visualized map for object, carry out hot ticket analysis, then corresponding, before 103, described method can also comprise: using the geographic position in visualized map as described target geographic position.For example, the visual Web map application that can realize based on Baidu map API of geolocation analysis.
104, for each target geographic position, weight calculation is carried out to the word of social network data text corresponding to described target geographic position, obtain and by the keyword of social network data text corresponding for described target geographic position, the hot ticket as described target geographic position pushes.
Concrete, after determining the social network data text that each target geographic position is corresponding, the each word of TF-IDF method to these social network data texts can be adopted to carry out weight calculation, keyword is extracted according to result of calculation, and the hot ticket of this keyword as this target geographic position is pushed, such as mark in this target geographic position.Then corresponding, described in 104, weight calculation is carried out to the word of social network data text corresponding to described target geographic position, specifically can comprise: utilize TF-IDF method, weight calculation is carried out to the word of social network data text corresponding to described target geographic position.
Concrete, obtain keyword and specifically can comprise the following steps: the normalization of word frequency calculates; Reverse word frequency calculates; Calculate entry weights and extract popular vocabulary.
Concrete further, word frequency (Term Frequency is called for short TF) refers to the frequency that some given words occur in certain document, and this numeral is the normalization to word number, to prevent the document that its deflection is long.Reverse document-frequency (Inverse Document Frequency is called for short IDF), if fewer for characterizing the document comprising certain entry, then IDF is larger, also just illustrates that this entry has good class discrimination ability.Try to achieve the TF-IDF value of entry afterwards, the value of TF-IDF just equals the product of TF value and IDF value.Finally the TF-IDF value of each word is sorted from big to small, the result that before selecting, several keywords are analyzed as hot ticket.
In practical application, when analytic target is short text, for microblogging, according to common TF-IDF algorithm extracting directly keyword, because microblogging text is usually shorter, if all microbloggings are regarded as a document will lose IDF information, if and wall scroll microblogging is regarded as a document, microblogging number of words is little, and the frequency that each word occurs is 1 substantially, also will lose TF information, this will affect the accuracy of the keyword finally chosen.
In order to improve precision of analysis, TF-IDF method is utilized described in 104, weight calculation is carried out to the word of social network data text corresponding to described target geographic position, obtains the keyword of social network data text corresponding to described target geographic position, specifically comprise:
For each word t of social network data text corresponding to described target geographic position i, using social network data text corresponding to described target geographic position as the first document, according to the first formulae discovery institute predicate t iword frequency tf i,j, described first formula is: wherein, n i,jfor institute predicate t ioccurrence number in described first document, ∑ kn k,jfor the occurrence number sum of all words in described first document;
Using the social network data text belonging to institute's predicate as the second document, according to the second formulae discovery institute predicate t ireverse document-frequency idf i, described first formula is: wherein, | D| is the number of files in corpus, | { j:t i∈ d j| for described corpus comprises institute predicate t inumber of files;
According to the 3rd formula, calculate and obtain institute predicate t iweights tfidf i,j, described 3rd formula is: tfidf i,j=tf i,j× idf i;
According to the weights of each word of social network data text corresponding to described target geographic position, descending sequence is carried out to described each word, using the word that the comes front k position keyword as social network data text corresponding to described target geographic position, wherein, k is default value.
In present embodiment, when calculating IDF, single social network data text is used as a document, when calculating TF value, using all social network data texts corresponding for described target geographic position as a document, so both had different word frequency, also contains IDF information, thus effectively improve the accuracy of keyword results, and then improve the accuracy of hot ticket analysis.
Optionally, visualized map can be utilized to push hot ticket corresponding to each target geographic position, for example, visual propelling movement can be carried out by calling Baidu map API, concrete, following several step can be divided into: (1) registration Baidu map API, loads API JS file; (2) create map cases, instantiation is carried out and initialization to map; (3) integration and the display of map interactive operation data is carried out.Concrete, the form that each target geographic position and corresponding hot ticket thereof are packaged into JSON is passed to foreground and carries out dissection process, the coordinate after parsing and corresponding keyword, i.e. hot ticket, visualized map marks.
In practical application, along with the universal of mobile device and the progress of wireless communication technique, increasing data are with geographical space attribute.For microblogging, due to real-time and the feature contained much information of microblogging, a large amount of microbloggings comprises geographical location information, but a geographic position may be associated with thousands of bar microblogging, user can not go to check all microbloggings, only sees that one or two microblogging comprehensively can be familiar with the microblogging event neither one of geographic location association.
And common social network analysis instrument is comprehensively analyze microblogging text mostly, comprise the analysis to microblogging characteristic, the excavation of theme and the detection etc. of accident, a large amount of geographical location information is comprised in the data of social networks, comprise the positional information of user, the positional information that microblogging is issued and the positional information etc. talked the matter over, but the analytic function of the geographical location information that these social network analysis instruments provide is weaker, mainly concentrate in the analysis to customer position information, and the application analyzed the positional information of event is less, the geographic position feature that not outstanding microblogging event is relevant.
The social networks affair analytical method based on geographic position that the present embodiment provides, by studying social network data text, analyze the social network data text with geographic location association, and by the keyword of social network data text corresponding for each geographic position, hot ticket as this geographic position pushes, and user can be helped to get the relevant hot ticket in geographic position intuitively.
The structural representation of the social networks event analysis system based on geographic position that Fig. 2 provides for the embodiment of the present invention two, as shown in Figure 2, described system comprises:
Word-dividing mode 21, for carrying out word segmentation processing to each social network data text, obtains the word of described social network data text;
Geographic position acquisition module 22, for setting up the mapping relations in geographic position corresponding to described social network data text and described social network data text, the geographic position that described social network data text is corresponding is in the word of described social network data text, the word relevant to geographic position;
Geolocation analysis module 23, for the geographic position corresponding according to each social network data text, determines the social network data text that each target geographic position of presetting is corresponding;
Event analysis module 24, for for each target geographic position, weight calculation is carried out to the word of social network data text corresponding to described target geographic position, obtain and by the keyword of social network data text corresponding for described target geographic position, the hot ticket as described target geographic position pushes.
In practical application, can first obtain in certain hour section from large Data Analysis Platform, the social network data text of some, such as, microblogging, then corresponding, described system also comprises: acquisition module, before carrying out word segmentation processing in word-dividing mode 21 to each social network data text, obtains the described social network data text of the predetermined number issued in preset time period.
Concrete, the described social network data text in the present embodiment can derive from large Data Analysis Platform ElasticSearch search engine, and for microblogging, all microbloggings all can be original microblogging, does not comprise the microblogging that user forwards.
Accordingly, get in certain hour section, after the social network data text of some, need to carry out pre-service to these social network data texts, pre-service then mainly carries out participle to social network data text.Still for microblogging, then need to carry out word segmentation processing to microblogging text message, concrete, text message here does not comprise the information such as the picture of user's issue.
In practical application, described word segmentation processing can be realized by numerous embodiments, such as, segmenter can be utilized to carry out word segmentation processing to social network data text.Optionally, word-dividing mode 21, specifically may be used for utilizing IKAnalyzer segmenter, carries out word segmentation processing to described social network data text.Concrete, to carry out participle to microblogging text, first word-dividing mode 21 loads dictionary, analyze microblogging text, intercept a token, search keyword adopts from most major term to the iterative searching mode cutting layer by layer of minimum word, retrieves maximum fractionation word in this search word in dictionary, proceeds the cutting of iterative searching mode by that analogy until terminate.
In practical application, geographic position acquisition module 22 by from each word of social network data text, can filter out the word relevant to geographic position, obtains the geographic position that described social network data text is corresponding.Concrete, geographic position acquisition module 22 can utilize three of search dog grades of administrative division dictionaries of place name to screen, accordingly, for microblogging, if the geographical location information had in above-mentioned dictionary detected in microblogging text, just in conjunction with the context of this microblogging, extract this geographical location information.Accordingly, geographic position acquisition module 22, after obtaining geographic position corresponding to described social network data text, gets up word association relevant to geographic position with these for described social network data text.Concrete, the geographical text analyzing method of named entity recognition can be adopted, utilize three grades of administrative division dictionaries of place name of search dog, if having the geographical location information in administrative division dictionary of place name in microblogging, just extract this geographical location information in conjunction with the context of microblogging, and associate with this microblogging.
Optionally, in order to save process resource, improve treatment effeciency, geographic position acquisition module 22 is for not comprising the social network data text with the word in geographic position, then determine this social network data text not containing geographical location information, accordingly, can be abandoned and not deal with.
In practical application, according to the mapping relations in each social network data text and geographic position, the social network data text that each target geographic position is corresponding can be determined.
Concrete, described each target geographic position can be determined according to actual needs, such as, with each geographic position in visualized map for object, hot ticket analysis can be carried out, then accordingly, in geolocation analysis module 23, also in the geographic position corresponding according to each social network data text, before determining the social network data text that each target geographic position of presetting is corresponding, using the geographic position in visualized map as described target geographic position.For example, the visual Web map application that can realize based on Baidu map API of geolocation analysis.
Concrete, after geolocation analysis module 23 determines the social network data text that each target geographic position is corresponding, event analysis module 24 can adopt each word of TF-IDF method to these social network data texts to carry out weight calculation, keyword is extracted according to result of calculation, and the hot ticket of this keyword as this target geographic position is pushed, such as mark in this target geographic position.Then corresponding, event analysis module 24, specifically may be used for utilizing TF-IDF method, carries out weight calculation to the word of social network data text corresponding to described target geographic position.
Concrete, event analysis module 24, the normalization that specifically may be used for word frequency calculates; Reverse word frequency calculates; Calculate entry weights and extract popular vocabulary.
Concrete further, word frequency (Term Frequency is called for short TF) refers to the frequency that some given words occur in certain document, and this numeral is the normalization to word number, to prevent the document that its deflection is long.Reverse document-frequency (Inverse Document Frequency is called for short IDF), if fewer for characterizing the document comprising certain entry, then IDF is larger, also just illustrates that this entry has good class discrimination ability.Try to achieve the TF-IDF value of entry afterwards, the value of TF-IDF just equals the product of TF value and IDF value.Finally the TF-IDF value of each word is sorted from big to small, the result that before selecting, several keywords are analyzed as hot ticket.
In practical application, when analytic target is short text, for microblogging, according to common TF-IDF algorithm extracting directly keyword, because microblogging text is usually shorter, if all microbloggings are regarded as a document will lose IDF information, if and wall scroll microblogging is regarded as a document, microblogging number of words is little, and the frequency that each word occurs is 1 substantially, also will lose TF information, this will affect the accuracy of the keyword finally chosen.
In order to improve precision of analysis, event analysis module 24 specifically can comprise:
First computing unit, for each word t for social network data text corresponding to described target geographic position i, using social network data text corresponding to described target geographic position as the first document, according to the first formulae discovery institute predicate t iword frequency tf i,j, described first formula is: wherein, n i,jfor institute predicate t ioccurrence number in described first document, ∑ kn k,jfor the occurrence number sum of all words in described first document;
Second computing unit, for using the social network data text belonging to institute's predicate as the second document, according to the second formulae discovery institute predicate t ireverse document-frequency idf i, described first formula is: wherein, | D| is the number of files in corpus, | { j:t i∈ d j| for described corpus comprises institute predicate t inumber of files;
3rd computing unit, for according to the 3rd formula, calculates and obtains institute predicate t iweights tfidf i,j, described 3rd formula is: tfidf i,j=tf i,j× idf i;
Processing unit, for the weights of each word according to social network data text corresponding to described target geographic position, descending sequence is carried out to described each word, using the word that the comes front k position keyword as social network data text corresponding to described target geographic position, wherein, k is default value.
Wherein, the value of k can the keyword quantity needed for reality be determined, the present embodiment is not limited at this.
In present embodiment, when the first computing unit calculates IDF, single social network data text is used as a document, when the second computing unit calculates TF value, using all social network data texts corresponding for described target geographic position as a document, so both had different word frequency, also contains IDF information, thus effectively improve the accuracy of keyword results, and then improve the accuracy of hot ticket analysis.
Optionally, visualized map can be utilized to push hot ticket corresponding to each target geographic position.Concrete, the form that each target geographic position and corresponding hot ticket thereof are packaged into JSON can be passed to foreground and carry out dissection process, the coordinate after parsing and corresponding keyword, i.e. hot ticket, visualized map marks.
The social networks event analysis system based on geographic position that the present embodiment provides, by studying social network data text, analyze the social network data text with geographic location association, and by the keyword of social network data text corresponding for each geographic position, hot ticket as this geographic position pushes, and user can be helped to get the relevant hot ticket in geographic position intuitively.
Those skilled in the art can be well understood to, and for convenience and simplicity of description, the specific works process of the system of foregoing description, with reference to the corresponding process in preceding method embodiment, can not repeat them here.
One of ordinary skill in the art will appreciate that: all or part of step realizing above-mentioned each embodiment of the method can have been come by the hardware that programmed instruction is relevant.Aforesaid program can be stored in a computer read/write memory medium.This program, when performing, performs the step comprising above-mentioned each embodiment of the method; And aforesaid storage medium comprises: ROM, RAM, magnetic disc or CD etc. various can be program code stored medium.
Last it is noted that above each embodiment is only in order to illustrate technical scheme of the present invention, be not intended to limit; Although with reference to foregoing embodiments to invention has been detailed description, those of ordinary skill in the art is to be understood that: it still can be modified to the technical scheme described in foregoing embodiments, or carries out equivalent replacement to wherein some or all of technical characteristic; And these amendments or replacement, do not make the essence of appropriate technical solution depart from the scope of various embodiments of the present invention technical scheme.

Claims (10)

1., based on the social networks affair analytical method in geographic position, it is characterized in that, comprising:
Word segmentation processing is carried out to each social network data text, obtains the word of described social network data text;
Set up the mapping relations of geographic position corresponding to described social network data text and described social network data text, the geographic position that described social network data text is corresponding is in the word of described social network data text, the word relevant to geographic position;
The geographic position corresponding according to each social network data text, determines the social network data text that each target geographic position of presetting is corresponding;
For each target geographic position, weight calculation is carried out to the word of social network data text corresponding to described target geographic position, obtain and by the keyword of social network data text corresponding for described target geographic position, the hot ticket as described target geographic position pushes.
2. method according to claim 1, is characterized in that, the word of the described social network data text corresponding to described target geographic position carries out weight calculation, comprising:
Utilize TF-IDF method, weight calculation is carried out to the word of social network data text corresponding to described target geographic position.
3. method according to claim 2, it is characterized in that, describedly utilize TF-IDF method, weight calculation is carried out to the word of social network data text corresponding to described target geographic position, obtain the keyword of social network data text corresponding to described target geographic position, specifically comprise:
For each word t of social network data text corresponding to described target geographic position i, using social network data text corresponding to described target geographic position as the first document, according to the first formulae discovery institute predicate t iword frequency tf i,j, described first formula is: wherein, n i,jfor institute predicate t ioccurrence number in described first document, Σ kn k,jfor the occurrence number sum of all words in described first document;
Using the social network data text belonging to institute's predicate as the second document, according to the second formulae discovery institute predicate t ireverse document-frequency idf i, described first formula is: wherein, | D| is the number of files in corpus, | { j:t i∈ d j| for described corpus comprises institute predicate t inumber of files;
According to the 3rd formula, calculate and obtain institute predicate t iweights tfidf i,j, described 3rd formula is: tfidf i,j=tf i,j× idf i;
According to the weights of each word of social network data text corresponding to described target geographic position, descending sequence is carried out to described each word, using the word that the comes front k position keyword as social network data text corresponding to described target geographic position, wherein, k is default value.
4. the method according to any one of claim 1-3, is characterized in that, describedly carries out word segmentation processing to social network data text, comprising:
Utilize IKAnalyzer segmenter, word segmentation processing is carried out to described social network data text.
5. the method according to any one of claim 1-3, is characterized in that, the described geographic position corresponding according to each social network data text, before determining the social network data text that each target geographic position of presetting is corresponding, also comprises:
Using the geographic position in visualized map as described target geographic position.
6. the method according to any one of claim 1-3, is characterized in that, described word segmentation processing is carried out to each social network data text before, also comprise:
Obtain the described social network data text of the predetermined number issued in preset time period.
7., based on the social networks event analysis system in geographic position, it is characterized in that, comprising:
Word-dividing mode, for carrying out word segmentation processing to each social network data text, obtains the word of described social network data text;
Geographic position acquisition module, for setting up the mapping relations in geographic position corresponding to described social network data text and described social network data text, the geographic position that described social network data text is corresponding is in the word of described social network data text, the word relevant to geographic position;
Geolocation analysis module, for the geographic position corresponding according to each social network data text, determines the social network data text that each target geographic position of presetting is corresponding;
Event analysis module, for for each target geographic position, weight calculation is carried out to the word of social network data text corresponding to described target geographic position, obtain and by the keyword of social network data text corresponding for described target geographic position, the hot ticket as described target geographic position pushes.
8. system according to claim 7, is characterized in that,
Described event analysis module, specifically for utilizing TF-IDF method, carries out weight calculation to the word of social network data text corresponding to described target geographic position.
9. system according to claim 8, is characterized in that, described event analysis module comprises:
First computing unit, for each word t for social network data text corresponding to described target geographic position i, using social network data text corresponding to described target geographic position as the first document, according to the first formulae discovery institute predicate t iword frequency tf i,j, described first formula is: wherein, n i,jfor institute predicate t ioccurrence number in described first document, Σ kn k,jfor the occurrence number sum of all words in described first document;
Second computing unit, for using the social network data text belonging to institute's predicate as the second document, according to the second formulae discovery institute predicate t ireverse document-frequency idf i, described first formula is: wherein, | D| is the number of files in corpus, | { j:t i∈ d j| for described corpus comprises institute predicate t inumber of files;
3rd computing unit, for according to the 3rd formula, calculates and obtains institute predicate t iweights tfidf i,j, described 3rd formula is: tfidf i,j=tf i,j× idf i;
Processing unit, for the weights of each word according to social network data text corresponding to described target geographic position, descending sequence is carried out to described each word, using the word that the comes front k position keyword as social network data text corresponding to described target geographic position, wherein, k is default value.
10. the system according to any one of claim 7-9, is characterized in that,
Described word-dividing mode, specifically for utilizing IKAnalyzer segmenter, carries out word segmentation processing to described social network data text.
CN201510061722.2A 2015-02-05 2015-02-05 Social network event analyzing method and system based on geographic positions Pending CN104615715A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201510061722.2A CN104615715A (en) 2015-02-05 2015-02-05 Social network event analyzing method and system based on geographic positions

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201510061722.2A CN104615715A (en) 2015-02-05 2015-02-05 Social network event analyzing method and system based on geographic positions

Publications (1)

Publication Number Publication Date
CN104615715A true CN104615715A (en) 2015-05-13

Family

ID=53150157

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201510061722.2A Pending CN104615715A (en) 2015-02-05 2015-02-05 Social network event analyzing method and system based on geographic positions

Country Status (1)

Country Link
CN (1) CN104615715A (en)

Cited By (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106202232A (en) * 2016-06-27 2016-12-07 中国南方电网有限责任公司电网技术研究中心 A kind of analysis method and device of power-off event
CN106257448A (en) * 2015-06-19 2016-12-28 阿里巴巴集团控股有限公司 The methods of exhibiting of a kind of key word and device
CN107016556A (en) * 2016-01-27 2017-08-04 阿里巴巴集团控股有限公司 Data processing method and device
CN107454121A (en) * 2016-05-30 2017-12-08 北京搜狗科技发展有限公司 A kind of method, apparatus of location tracking, mobile terminal and server
CN107908766A (en) * 2017-11-28 2018-04-13 深圳市城市规划设计研究院有限公司 A kind of city focus incident dynamic monitoring method and system
CN108446274A (en) * 2018-03-15 2018-08-24 北京科技大学 A kind of keyword extracting method based on time-sensitive tf-idf
CN108509589A (en) * 2018-03-29 2018-09-07 优视科技(中国)有限公司 Information flow methods of exhibiting and system, computer readable storage medium
CN109117446A (en) * 2017-06-26 2019-01-01 精彩旅图(北京)科技发展有限公司 Show the dynamic method, apparatus of user, system and computer-readable medium
CN109255023A (en) * 2017-07-11 2019-01-22 中国移动通信集团浙江有限公司 Hint information processing method and processing device
CN111291176A (en) * 2018-12-06 2020-06-16 北京国双科技有限公司 Hot event mining method and device
CN111323040A (en) * 2018-12-14 2020-06-23 上海博泰悦臻网络技术服务有限公司 Method, system, medium and vehicle-mounted terminal for displaying geographic position information
CN115757565A (en) * 2023-01-09 2023-03-07 无锡容智技术有限公司 Text data geographic position positioning method and device

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101079024A (en) * 2006-06-19 2007-11-28 腾讯科技(深圳)有限公司 Special word list dynamic generation system and method
CN101694659A (en) * 2009-10-20 2010-04-14 浙江大学 Individual network news recommending method based on multitheme tracing
CN102364473A (en) * 2011-11-09 2012-02-29 中国科学院自动化研究所 Netnews search system and method based on geographic information and visual information
US20130031458A1 (en) * 2011-07-27 2013-01-31 Microsoft Corporation Hyperlocal content determination
CN102982157A (en) * 2012-12-03 2013-03-20 北京奇虎科技有限公司 Device and method used for mining microblog hot topics
CN104331483A (en) * 2014-11-05 2015-02-04 北京航空航天大学 Method and equipment for detecting area events based on short text data

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101079024A (en) * 2006-06-19 2007-11-28 腾讯科技(深圳)有限公司 Special word list dynamic generation system and method
CN101694659A (en) * 2009-10-20 2010-04-14 浙江大学 Individual network news recommending method based on multitheme tracing
US20130031458A1 (en) * 2011-07-27 2013-01-31 Microsoft Corporation Hyperlocal content determination
CN102364473A (en) * 2011-11-09 2012-02-29 中国科学院自动化研究所 Netnews search system and method based on geographic information and visual information
CN102982157A (en) * 2012-12-03 2013-03-20 北京奇虎科技有限公司 Device and method used for mining microblog hot topics
CN104331483A (en) * 2014-11-05 2015-02-04 北京航空航天大学 Method and equipment for detecting area events based on short text data

Cited By (18)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106257448A (en) * 2015-06-19 2016-12-28 阿里巴巴集团控股有限公司 The methods of exhibiting of a kind of key word and device
US11727075B2 (en) 2015-06-19 2023-08-15 Advanced New Technologies Co., Ltd. Enhancing accuracy of presented search keywords
US11403357B2 (en) 2015-06-19 2022-08-02 Advanced New Technologies Co., Ltd. Enhancing accuracy of presented search keywords
EP3312738A4 (en) * 2015-06-19 2019-02-20 Alibaba Group Holding Limited Method and device for displaying keyword
CN107016556B (en) * 2016-01-27 2021-02-05 创新先进技术有限公司 Data processing method and device
CN107016556A (en) * 2016-01-27 2017-08-04 阿里巴巴集团控股有限公司 Data processing method and device
CN107454121A (en) * 2016-05-30 2017-12-08 北京搜狗科技发展有限公司 A kind of method, apparatus of location tracking, mobile terminal and server
CN107454121B (en) * 2016-05-30 2021-09-14 北京搜狗科技发展有限公司 Position tracking method and device, mobile terminal and server
CN106202232A (en) * 2016-06-27 2016-12-07 中国南方电网有限责任公司电网技术研究中心 A kind of analysis method and device of power-off event
CN109117446A (en) * 2017-06-26 2019-01-01 精彩旅图(北京)科技发展有限公司 Show the dynamic method, apparatus of user, system and computer-readable medium
CN109255023A (en) * 2017-07-11 2019-01-22 中国移动通信集团浙江有限公司 Hint information processing method and processing device
CN107908766A (en) * 2017-11-28 2018-04-13 深圳市城市规划设计研究院有限公司 A kind of city focus incident dynamic monitoring method and system
CN107908766B (en) * 2017-11-28 2019-11-19 深圳市城市规划设计研究院有限公司 A kind of city focus incident dynamic monitoring method and system
CN108446274A (en) * 2018-03-15 2018-08-24 北京科技大学 A kind of keyword extracting method based on time-sensitive tf-idf
CN108509589A (en) * 2018-03-29 2018-09-07 优视科技(中国)有限公司 Information flow methods of exhibiting and system, computer readable storage medium
CN111291176A (en) * 2018-12-06 2020-06-16 北京国双科技有限公司 Hot event mining method and device
CN111323040A (en) * 2018-12-14 2020-06-23 上海博泰悦臻网络技术服务有限公司 Method, system, medium and vehicle-mounted terminal for displaying geographic position information
CN115757565A (en) * 2023-01-09 2023-03-07 无锡容智技术有限公司 Text data geographic position positioning method and device

Similar Documents

Publication Publication Date Title
CN104615715A (en) Social network event analyzing method and system based on geographic positions
US11244011B2 (en) Ingestion planning for complex tables
CN108701161B (en) Providing images for search queries
US9720904B2 (en) Generating training data for disambiguation
CN107784010B (en) Method and equipment for determining popularity information of news theme
Malmasi et al. Location mention detection in tweets and microblogs
US8838657B1 (en) Document fingerprints using block encoding of text
CN106886567B (en) Microblogging incident detection method and device based on semantic extension
US20190073406A1 (en) Processing of computer log messages for visualization and retrieval
CN111512315A (en) Block-wise extraction of document metadata
CN110427612B (en) Entity disambiguation method, device, equipment and storage medium based on multiple languages
US20130198240A1 (en) Social Network Analysis
JP2018537760A (en) Method and apparatus for account mapping based on address information
WO2017104655A1 (en) Information analysis system, information analysis method, and recording medium
Hosseini et al. Location oriented phrase detection in microblogs
CN110287405A (en) The method, apparatus and storage medium of sentiment analysis
CN111586695B (en) Short message identification method and related equipment
CN107085568A (en) A kind of text similarity method of discrimination and device
Moncla et al. Mapping urban fingerprints of odonyms automatically extracted from French novels
US20170235835A1 (en) Information identification and extraction
CN110688540A (en) Cheating account screening method, device, equipment and medium
CN104881446A (en) Searching method and searching device
CN110895587B (en) Method and device for determining target user
US9092409B2 (en) Smart scoring and filtering of user-annotated geocoded datasets
Comber et al. Semantic analysis of citizen sensing, crowdsourcing and VGI

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20150513