CN103064951B - A kind of spatial identification method and apparatus of public feelings information - Google Patents

A kind of spatial identification method and apparatus of public feelings information Download PDF

Info

Publication number
CN103064951B
CN103064951B CN201210583484.8A CN201210583484A CN103064951B CN 103064951 B CN103064951 B CN 103064951B CN 201210583484 A CN201210583484 A CN 201210583484A CN 103064951 B CN103064951 B CN 103064951B
Authority
CN
China
Prior art keywords
region
region attribute
word
information
attribute word
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201210583484.8A
Other languages
Chinese (zh)
Other versions
CN103064951A (en
Inventor
史波良
李名臣
丁荟
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
NANJING FIBERHOME INFORMATION DEVELOPMENT Co Ltd
Original Assignee
NANJING FIBERHOME INFORMATION DEVELOPMENT Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by NANJING FIBERHOME INFORMATION DEVELOPMENT Co Ltd filed Critical NANJING FIBERHOME INFORMATION DEVELOPMENT Co Ltd
Priority to CN201210583484.8A priority Critical patent/CN103064951B/en
Publication of CN103064951A publication Critical patent/CN103064951A/en
Application granted granted Critical
Publication of CN103064951B publication Critical patent/CN103064951B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

Present patent application provides the spatial identification method and apparatus of a kind of public feelings information, first the text comprising public feelings information is carried out Chinese word segmentation, extract participle information;Then, the region attribute storehouse of loaded ribbon administrative region rank, the text message after participle is carried out region attribute storehouse coupling, obtains region attribute;Build region attribute word information collection, and according to administrative division region attribute word information, calculate region attribute word weight;Finally, according to weight threshold, filter secondary information, it is determined that region belonging to public feelings information.Pass through above method, it is possible to be accurately positioned region belonging to public feelings information.

Description

A kind of spatial identification method and apparatus of public feelings information
Technical field
Present patent application belongs to technical field of network information, particularly relates to the spatial identification method and apparatus of a kind of public feelings information.
Background technology
Along with internet developing rapidly in the world, the network media has been acknowledged as " fourth media " after newspaper, broadcast, TV, Network becomes one of main carriers of reflection Social Public Feelings, and the main source of the public feelings information under network environment has: news analysis, BBS, blog, Polymerization news.Network public-opinion is expressed fast, information is polynary, mode is interactive, has the advantage that traditional media is incomparable.The opening of network and void The property intended, determines network public-opinion and has substantivity, the sudden and feature of deviation.In recent years, the impact of network public-opinion is more and more important, many The most great public sentiment event is all outburst diffusion by the net, and major part public feelings information all has negative.In order to accurately and effectively find Locally associated public feelings information, dredges internet flame and manages in time, accurately judges that the area belonging to network public sentiment information just seems Particularly important.
Having many spatial identification methods at present, but typically the most only extract ground noun, the most effectively the major region for public feelings information is entered Row identifies, or identifies major region only by word frequency.It is possible that multiple place name in public feelings information, but what public feelings information really occurred Area only one of which.In actual application, what somewhere gave priority to is locally associated public feelings information, such as government organs, the enterprises and institutions in Nanjing Unit is preferentially concerned about the public feelings information that In Nanjing is relevant, and existing spatial identification mode may retrieve the little information that much associates with Nanjing, Also need to user's manual screening relevant information from substantial amounts of information, reduce operating efficiency.
Summary of the invention
Present patent application is to be solved be technical problem is that: provide the spatial identification method and apparatus of a kind of public feelings information, it is achieved to belonging to public feelings information The accurate judgement of region, makes up the deficiency that existing spatial identification method cannot be accurately positioned.
For solve above-mentioned technical problem, present patent application provides the spatial identification method and apparatus of a kind of public feelings information, including Text Pretreatment module, Region attribute word extraction module, area weight computation module and area filtering module.Wherein:
Text Pretreatment module: the text comprising public feelings information is carried out word segmentation processing;
Region attribute word extraction module: load region attribute storehouse, carries out region attribute storehouse coupling to the text message after participle, obtains region attribute,
Build region attribute word information collection, and according to administrative division region attribute word information;
Area weight computation module: calculate region attribute word weight at different levels and weight ratio;
Area filtering module: according to weight threshold, filters secondary information.
The spatial identification method profit device of a kind of public feelings information described in present patent application, its job step is as follows:
(1) text comprising public feelings information is carried out Chinese word segmentation, extract participle information;
(2) the region attribute storehouse of loaded ribbon administrative region rank, carries out region attribute storehouse coupling to the public feelings information after participle, obtains region attribute;
(3) region attribute word information collection is built;
(4) according to administrative division region attribute word information;
(5) weight and the weight ratio of region attribute word at different levels are calculated;
(6) according to weight threshold, filter secondary information, accurately judge region belonging to public feelings information.
Region attribute storehouse by a large number through long-time investigation, gather, optimize and dictionary that the information that obtains forms, including area word, represent building, The information such as regional culture vocabulary, region website plate URL, it is possible to meet user's request.
Region attribute word information collection includes region attribute word, region attribute word position in the text, region attribute word occurrence number, context area The information such as attribute word and spacing distance.
The factor of weighing factor mainly includes that position and occurrence number occurs in region attribute word.
The beneficial effect of present patent application:
On the basis of content of text region attribute extracts, the degree of correlation of research area's attribute, from mass data, accurately filter out relevant information, filter Except secondary information, improve the accuracy of public feelings information spatial identification, it is ensured that the quality of other application data-pushings.Can help government department and Time grasp public opinion, just key area be identified in the sprouting stage of unknown public feelings information and effectively dredge;Enterprise is helped to slap in the very first time Hold rival's market trend in specific region, assist enterprise to formulate effective competitive strategy in time.
Accompanying drawing explanation
Fig. 1 is the module distribution figure of present patent application
Fig. 2 is the schematic flow sheet of present patent application
Detailed description of the invention
The module distribution situation of present patent application is as it is shown in figure 1, include Text Pretreatment module, region attribute word extraction module, area weight calculation Module and area filtering module.
The public feelings information spatial identification flow process of present patent application is as in figure 2 it is shown, carry out present patent application below in conjunction with Fig. 2 and specific embodiment in detail Thin description.
Step 1: the text comprising public feelings information carries out Chinese word segmentation, extracts participle information, filters out irrelevant information.
Step 2: the region attribute storehouse of loaded ribbon administrative region rank, carries out region attribute storehouse coupling to the public feelings information after participle, obtains area and belongs to Property.
Region attribute storehouse by a large number through long-time investigation, gather, optimize and dictionary that the information that obtains forms, including area word, represent building, The information such as regional culture feature vocabulary, region website plate URL.If Eiffel Tower is the representative building in Paris, the Forbidden City is Pekinese's characteristic scape Point, is all present in region attribute storehouse, and storage corresponding with area word.And for example the cake made of glutinous rice, sold in sliced pieces is Xinjiang characteristic food, represents the cooking culture in Xinjiang, Occurring should be relevant with Xinjiang in the model of " cake made of glutinous rice, sold in sliced pieces ".
For example, a public feelings information is " Liaoning Panjin occurs expropriation of land dispute ", Jinan, Shandong Province newspaper office report, in body part occurs in that The place names such as illiteracy, Guangxi, Guilin, then newspaper office's title of Jinan newspaper office is a kind of region attribute in Jinan, by with the information in region attribute storehouse Mate, obtain more region attribute information.
Step 3: build region attribute word information collection.Region attribute word information collection include region attribute word, region attribute word position in the text, The information such as region attribute word occurrence number, context region attribute word and spacing distance, region attribute word information collection form is as follows:
[region attribute word ((there is place) in occurrence number, context region attribute word (spacing distance))]
Wherein, spacing distance is the number of words at interval between region attribute word and context region attribute word.
Holding embodiment described in step 2, the title division of this public feelings information occurs in that Liaoning, Panjin, and body part occurs in that Inner Mongol, Guangxi, osmanthus The place names such as woods, have Shandong, Jinan in newspaper office's title, the region attribute word information collection of structure is as follows:
[Liaoning (2 (title, first section), Panjin (0));Panjin (7 (title, first section), Liaoning (0));Shandong (1 (text), Ji South (0));Jinan (1 (text), Liaoning (0));The Inner Mongol (1 (text));Guangxi (1 (text), Guilin (0));Guilin (1 (text), Guangxi (0))]
Step 4: block out attribute word information according to administrative region (such as province, city, district, county etc.).
When certain region attribute word is present in multiple administrative region, which administrative region region attribute word judgment based on context belongs to;As above Hereafter without association attributes word, determine whether to belong to a certain administrative region further according to affiliated web site plate URL etc..As still judged, then at multiple row Territory, administrative division all retains this area's attribute word.
In the present embodiment, according to forming following data structure after administrative division analysis region attribute word information:
Save City District County
Liaoning (2 (title, first section), Panjin (0)) Panjin (7 (title, first section), Liaoning (0)) Nothing Nothing
The Inner Mongol (1 (text)) Nothing Nothing Nothing
Guangxi (1 (text), Guilin (0)) Guilin (1 (text), Guangxi (0)) Nothing Nothing
Shandong (1 (text), Jinan (0)) Jinan (1 (text), Liaoning (0)) Nothing Nothing
Step 5: calculate weight and the weight ratio of region attribute word at different levels, including provincial weight and weight ratio, the weight of city-level and weight ratio, district Weight at county level and weight ratio.
The method of salary distribution of region attribute word weight is as follows:
1, in region attribute word noun initial weight consistent, be set to 1 for example, it is possible to unified;
2, region attribute word progressively weights along with thin the drawing of administrative region rank, and weight ratio can be finely adjusted according to actual conditions, such as, and city-level Region attribute word can be weight * 1.1 of provincial region attribute word, in the present embodiment, and weight * 1.1 in the weight of " Panjin "=" Liaoning ";
It is also preferred that the left owing to region attribute is mainly region forum plate, public feelings information typically will not mention big place name, in region attribute word Side's property website plate URL initial weight is the most relatively low, such as, may be configured as 1*0.5 (ratio adjustable), the most both will not drain message, Dictionary maintenance and repeatedly noun can also be reduced.
3, region attribute word occurs in emphasis position and needs weighting, emphasis position include title, first section etc., weight ratio can be arranged as required to, example As, in the present embodiment, the weight=1*2 in " Liaoning " (based on 1 weight, 2 is weight ratio);
4, when somewhere attribute word occurs repeatedly, can be incremented by a certain proportion of weight, weight ratio is arranged as required to by user every time.Such as " Panjin " occurs 7 times in the present embodiment, as a example by the amplitude being every time incremented by 10%:
Weight=the 1+1*1.1+1*1.2+1*1.3+ of " Panjin " ... .+1*1.6
Can also realize the weight superposition of the word repeatedly occurred with other weighting scheme, effect is to allow the region attribute word weight of high frequency Promote.
5, the weight of the weight of total weight=this grade of region attribute word+subordinate's region attribute word
In the present embodiment, the total weight of " Liaoning " total weight=" Liaoning " region attribute word weight+" Panjin ",
" Panjin " total weight=" Panjin " region attribute word weight+total weight in district
After calculating the weight of region attribute word at different levels, the region attribute word at different levels proportion in the overall situation, and area at different levels can be calculated further Attribute word proportion under the administrative region rank that oneself is affiliated.
Weight ratio=this grade of weight/total weight
In this example, it is assumed that the weight ratio calculating " Liaoning " is 82%, the weight ratio of " Panjin " is 70%, then " Panjin " is relative Proportion in " Liaoning " is 85%.
Other total weight calculation mode can also be used, utilize the main spot of various informix ground inferential information.
Step 6: according to weight threshold, filters secondary information, accurately judges region belonging to public feelings information.
Weight threshold can be as required by user's sets itself.
In the present embodiment, as a example by city-level, threshold value is set to: filter the weight ratio region less than 10%, when the weight ratio of a certain region is more than 80% Time, filter remaining all region.Assume that calculating " Panjin " weight ratio is 70%, " Guilin " 9.11%, " Jinan " 9.11%, " interior Mongolia " 4.36%, then according to threshold value, filter the weight ratio region less than 10%, final reservation " Panjin ".
The explanation of above example is only applicable to help to understand the principle of present patent application, simultaneously to one of ordinary skill in the art, according to this patent Application embodiment, all will change in detailed description of the invention and range of application, as can be to change the mode of going forward one by one of weight and amalgamation mode. In sum, this specification content should not be construed as the restriction to present patent application.

Claims (4)

1. the spatial identification method of a public feelings information, it is characterised in that comprise the steps:
(1) text comprising public feelings information is carried out Chinese word segmentation, extract participle information;
(2) the region attribute storehouse of loaded ribbon administrative region rank, carries out region attribute storehouse coupling to the public feelings information after participle, obtains region attribute;
(3) region attribute word information collection is built;
(4) according to administrative division region attribute word information;
(5) weight and the weight ratio of region attribute word at different levels are calculated;
(6) according to weight threshold, filter secondary information, accurately judge region belonging to public feelings information;
Wherein: region attribute storehouse includes area word, represents building, regional culture feature vocabulary, region website plate URL;
Region attribute word information collection includes region attribute word, region attribute word position, region attribute word occurrence number, context region attribute word and spacing distance in the text;
When certain region attribute word is concurrently present in multiple administrative region, division principle is: first, and which administrative region region attribute word judgment based on context belongs to;If context is without association attributes word, determine whether to belong to a certain administrative region further according to affiliated web site plate URL;
As still judged, then in above-mentioned multiple administrative regions, all retain this area's attribute word.
The spatial identification method of a kind of public feelings information the most as claimed in claim 1, it is characterised in that: the factor of weighing factor mainly includes that position and occurrence number occurs in region attribute word.
The spatial identification method of a kind of public feelings information the most as claimed in claim 1, it is characterised in that: weight ratio and weight threshold are arranged by User Defined.
4. the spatial identification device of a public feelings information, it is characterised in that: include Text Pretreatment module, region attribute word extraction module, area weight computation module and area filtering module;
Text Pretreatment module: the text comprising public feelings information is carried out word segmentation processing;
Region attribute word extraction module: the region attribute storehouse of loaded ribbon administrative region rank, carries out region attribute storehouse coupling to the text message after participle, obtains region attribute, builds region attribute word information collection, and according to administrative division region attribute word information;
Area weight computation module: calculate region attribute word weight at different levels and weight ratio;
Area filtering module: according to weight threshold, filters secondary information, accurately judges region belonging to public feelings information;
Wherein: region attribute storehouse includes area word, represents building, regional culture feature vocabulary, region website plate URL;
Region attribute word information collection includes region attribute word, region attribute word position, region attribute word occurrence number, context region attribute word and spacing distance in the text;
When certain region attribute word is concurrently present in multiple administrative region, division principle is: first, and which administrative region region attribute word judgment based on context belongs to;If context is without association attributes word, determine whether to belong to a certain administrative region further according to affiliated web site plate URL;As still judged, then in above-mentioned multiple administrative regions, all retain this area's attribute word.
CN201210583484.8A 2012-12-31 2012-12-31 A kind of spatial identification method and apparatus of public feelings information Active CN103064951B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201210583484.8A CN103064951B (en) 2012-12-31 2012-12-31 A kind of spatial identification method and apparatus of public feelings information

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201210583484.8A CN103064951B (en) 2012-12-31 2012-12-31 A kind of spatial identification method and apparatus of public feelings information

Publications (2)

Publication Number Publication Date
CN103064951A CN103064951A (en) 2013-04-24
CN103064951B true CN103064951B (en) 2016-08-31

Family

ID=48107581

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201210583484.8A Active CN103064951B (en) 2012-12-31 2012-12-31 A kind of spatial identification method and apparatus of public feelings information

Country Status (1)

Country Link
CN (1) CN103064951B (en)

Families Citing this family (16)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103793492B (en) * 2014-01-22 2017-01-18 武汉虹旭信息技术有限责任公司 Map regionalization analytic system and method based on Mobile Internet harmful information
CN104899202B (en) * 2014-03-04 2019-03-19 华为技术有限公司 A kind of information processing method and system
CN106610998A (en) * 2015-10-26 2017-05-03 烽火通信科技股份有限公司 Novel web data region-based noise filtering method
CN106886512B (en) * 2015-12-15 2020-11-17 腾讯科技(深圳)有限公司 Article classification method and device
JP6271617B2 (en) * 2016-02-25 2018-01-31 ヤフー株式会社 Information processing apparatus, information processing method, and information processing program
CN107153654B (en) * 2016-03-03 2020-04-28 阿里巴巴集团控股有限公司 Method and device for identifying region to which user belongs
CN106570130B (en) * 2016-10-27 2019-10-01 厦门市美亚柏科信息股份有限公司 Text region judgment method and its system based on RDF knowledge base
CN106919705A (en) * 2017-03-10 2017-07-04 北京搜狐新媒体信息技术有限公司 The affiliated spatial identification method and device of the network information
CN107491548A (en) * 2017-08-28 2017-12-19 武汉烽火普天信息技术有限公司 A kind of network public-opinion text message recommends and method for visualizing
CN108021651B (en) * 2017-11-30 2020-07-28 中科金联(北京)科技有限公司 Network public opinion risk assessment method and device
CN108876440B (en) * 2018-05-29 2021-09-03 创新先进技术有限公司 Region dividing method and server
CN108959516B (en) * 2018-06-28 2019-08-13 北京百度网讯科技有限公司 Conversation message treating method and apparatus
CN110750636A (en) * 2018-07-04 2020-02-04 百度在线网络技术(北京)有限公司 Network public opinion information processing method and device
CN109359174B (en) * 2018-09-03 2019-08-20 杭州数梦工场科技有限公司 Administrative division belongs to recognition methods, device, storage medium and computer equipment
CN109635276B (en) * 2018-11-12 2020-12-11 厦门市美亚柏科信息股份有限公司 Information matching method and terminal
CN109271640B (en) * 2018-11-13 2021-09-17 腾讯科技(深圳)有限公司 Text information region attribute identification method and device and electronic equipment

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101751458A (en) * 2009-12-31 2010-06-23 暨南大学 Network public sentiment monitoring system and method
CN102426603A (en) * 2011-11-11 2012-04-25 任子行网络技术股份有限公司 Text information regional recognition method and device
CN102779174A (en) * 2012-06-26 2012-11-14 北京奇虎科技有限公司 Public opinion information display system and method

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR100892847B1 (en) * 2007-05-29 2009-04-10 엔에이치엔(주) Method and system supporting public opinion according to advertisement performance

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101751458A (en) * 2009-12-31 2010-06-23 暨南大学 Network public sentiment monitoring system and method
CN102426603A (en) * 2011-11-11 2012-04-25 任子行网络技术股份有限公司 Text information regional recognition method and device
CN102779174A (en) * 2012-06-26 2012-11-14 北京奇虎科技有限公司 Public opinion information display system and method

Also Published As

Publication number Publication date
CN103064951A (en) 2013-04-24

Similar Documents

Publication Publication Date Title
CN103064951B (en) A kind of spatial identification method and apparatus of public feelings information
Fang et al. A comprehensive assessment of urban vulnerability and its spatial differentiation in China
Xu et al. Factors affecting CO2 emissions in China’s agriculture sector: Evidence from geographically weighted regression model
CN107704637B (en) knowledge graph construction method for emergency
CN103390051B (en) A kind of topic detection and tracking method based on microblog data
CN103473280B (en) Method for mining comparable network language materials
CN103186612B (en) A kind of method of classified vocabulary, system and implementation method
Foo A retrospective analysis of the trend of retracted publications in the field of biomedical and life sciences
CN105653518A (en) Specific group discovery and expansion method based on microblog data
CN103823890B (en) A kind of microblog hot topic detection method for special group and device
CN105488092A (en) Time-sensitive self-adaptive on-line subtopic detecting method and system
CN103077190A (en) Hot event ranking method based on order learning technology
CN103268350A (en) Internet public opinion information monitoring system and monitoring method
CN105718598A (en) AT based time model construction method and network emergency early warning method
CN107609103A (en) It is a kind of based on push away spy event detecting method
CN102831119B (en) Short text clustering Apparatus and method for
CN104504151B (en) WeChat public sentiment monitoring system
CN101894129B (en) Video topic finding method based on online video-sharing website structure and video description text information
CN109376352A (en) A kind of patent text modeling method based on word2vec and semantic similarity
CN107832467A (en) A kind of microblog topic detecting method based on improved Single pass clustering algorithms
CN104021180B (en) A kind of modular software defect report sorting technique
CN103136219A (en) Method and device for requirement mining and based on timeliness
CN101625695B (en) Method and system for extracting complex named entities from Web video p ages
CN105224630A (en) Based on the integrated approach of Ontology on Semantic Web data
CN104536957A (en) Retrieval method and system for rural land circulation information

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant