CN105117385B - A kind of method and system that public opinion information extraction is carried out based on matrix computations - Google Patents

A kind of method and system that public opinion information extraction is carried out based on matrix computations Download PDF

Info

Publication number
CN105117385B
CN105117385B CN201510569894.0A CN201510569894A CN105117385B CN 105117385 B CN105117385 B CN 105117385B CN 201510569894 A CN201510569894 A CN 201510569894A CN 105117385 B CN105117385 B CN 105117385B
Authority
CN
China
Prior art keywords
information
matrix
weight
keyword
information source
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201510569894.0A
Other languages
Chinese (zh)
Other versions
CN105117385A (en
Inventor
杜登斌
杜璞
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Zhong Run Pu Da Information Technology Co Ltd
Original Assignee
Beijing Zhong Run Pu Da Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Zhong Run Pu Da Information Technology Co Ltd filed Critical Beijing Zhong Run Pu Da Information Technology Co Ltd
Priority to CN201510569894.0A priority Critical patent/CN105117385B/en
Publication of CN105117385A publication Critical patent/CN105117385A/en
Application granted granted Critical
Publication of CN105117385B publication Critical patent/CN105117385B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The present invention proposes a kind of method and system that public opinion information extraction is carried out based on matrix computations, and this method includes:Capture the site information of internet, information source matrix is established, wherein information source matrix includes information base type information, information base information, site information, substation point information, information source statistical information, information bank authority information, primary attribute information, the part of article field information eight;Establish participle matrix, regular matrix, described information source matrix and the weight and candidate's rank of the participle matrix and the regular matrix are obtained respectively, user inputs user's keyword, the affiliated industry of user's keyword is selected by segmenting matrix, according to the weight and candidate's level calculation comprehensive grading, to complete the analysis of public opinion.The present invention need not establish emotion dictionary, and real-time various dimensions capture data, establish information source matrix, participle matrix, regular matrix, and three matrixes are associated and reach dynamic equilibrium, the accurate word for searching user's inquiry, accuracy rate more than 95%.

Description

A kind of method and system that public opinion information extraction is carried out based on matrix computations
Technical field
The present invention relates to network public-opinion field, more particularly to a kind of public opinion information extraction is carried out based on matrix computations Method and system.
Background technology
With the rapid development of internet in the world, the network media has been acknowledged as after newspaper, broadcast, TV " fourth media " afterwards, network turn into one of main carriers of reflection Social Public Feelings.
Network public-opinion is that have by transmission on Internet, the public to what some focuses, focal issue in actual life were held Stronger influence power, tendentious emotion, attitude, opinion, speech or viewpoint, its mainly by forum BBS posting comment and Follow-up post, blog Blog etc. are realized and strengthened.Due to internet have virtual, disguised, diversity, permeability and arbitrarily The features such as property, increasing netizen gladly expresses viewpoint, propagating thought by this channel.
Network public-opinion is one powerful public opinion strength, can react on focus incident and to social development and state of affairs process Produce certain influence.Due to the opening of network, network public-opinion can be caused to be formed rapidly, it is huge to social influence.Particularly When there is negative Internet news public sentiment, if can not in time understand, effectively guide, it is easy to public opinion crisis is formed, when serious Even influence public safety.Positive neutralizing to Internet news public opinion crisis, to maintaining social stability, promoting national development to have Important realistic meaning, and establishment harmonious society should have intension.Internet news public sentiment viewpoint is collected with suitable Important meaning, netizen's viewpoint plays vital effect in the evolution of a focus incident, it might even be possible to is recognized To be the core of Internet news public sentiment.
Recently, developing rapidly with Internet technology, the new media with news media etc. for representative break the control of information System and monopolization, people's Free Surface reaches the attitude and opinion of oneself on network, no longer as the past is so easily unconditionally accepted, On the contrary, the Interest demands of different estate are presented one after another, different thought viewpoint head-on crash.For related governmental departments, how Awareness network news public sentiment promptly and accurately, strengthen timely monitoring, effectively guiding to Internet news public opinion, turn into Internet news One big difficult point of public sentiment management.In this case, construction can cover the news public sentiment monitoring system in news data source very Necessity, such system can be directed to new news media's communication environments, further further investigate the focus analysis method of news public sentiment And the influence that new media is brought, the research of news public sentiment is carried out abundant and perfect.
Although there are many units to propose some different solutions for the monitoring of Internet news public sentiment at present.But Be, it is necessary to those skilled in the art solve technical problem be how to improve judge Internet news public feelings information efficiency and accurately Degree.Because so far, there has been no the network public-opinion monitoring system for more efficiently, being accurately directed to news media's data.
The content of the invention
In view of the shortcomings of the prior art, the present invention proposes a kind of side that public opinion information extraction is carried out based on matrix computations Method and system.
The present invention proposes a kind of method that public opinion information extraction is carried out based on matrix computations, including
Step 1, the site information of internet is captured, establishes information source matrix, wherein information source matrix includes information bank class Type information, information base information, site information, substation point information, information source statistical information, information bank authority information, primary attribute Information, the part of article field information eight;
Step 2, participle matrix, regular matrix are established, obtains described information source matrix and the participle matrix and institute respectively The weight and candidate's rank of regular matrix are stated, user inputs user's keyword, and user's keyword is selected by segmenting matrix Affiliated industry, and according to the weight and candidate's level calculation comprehensive grading, to complete the analysis of public opinion.
Element of the participle matrix using the possible affiliated industry of user's keyword as the participle matrix, it is described User by industry belonging to selection, reduces range of search, improves efficiency when inputting user's keyword.
The regular matrix includes searching the keyword for best embodying article content in website, carries out keyword mark.
The weight is obtained by below equation:
Information source weight calculation formula:Q*w*e=r, in, q scores for information source type, and w is information source website importance Classification, e for whether top set, r is information source weight;
Segment weight calculation formula:X*y=u, in, x is the industrial characteristic expressivity of keyword, and y is the sensitivity of keyword Degree, u are participle weight;
Regular weight calculation formula:G*h=k, in, g is the industrial characteristic expressivity of rule, and h is the Sentiment orientation of rule Degree, k are regular weight.
The formula for calculating the comprehensive grading is:
a1*b1+…ai*bi=M
Wherein a is weight, and b is candidate's rank, and i is i-th of weight, and j is j-th candidates rank, and M is comprehensive grading.
The present invention also proposes a kind of system that public opinion information extraction is carried out based on matrix computations, including
Information source matrix module is established, for capturing the site information of internet, establishes information source matrix, wherein information source Matrix includes information base type information, information base information, site information, substation point information, information source statistical information, information bank power Limit information, primary attribute information, the part of article field information eight;
Participle matrix, regular matrix module are established, for establishing participle matrix, regular matrix, obtains described information respectively The weight and candidate's rank of source matrix and the participle matrix and the regular matrix, user inputs user's keyword, by dividing Word matrix selects the affiliated industry of user's keyword, and according to the weight and candidate's level calculation comprehensive grading, To complete the analysis of public opinion.
Element of the participle matrix using the possible affiliated industry of user's keyword as the participle matrix, it is described User by industry belonging to selection, reduces range of search, improves efficiency when inputting user's keyword.
The regular matrix includes searching the keyword for best embodying article content in website, carries out keyword mark.
The weight is obtained by below equation:
Information source weight calculation formula:Q*w*e=r, in, q scores for information source type, and w is information source website importance Classification, e for whether top set, r is information source weight;
Segment weight calculation formula:X*y=u, in, x is the industrial characteristic expressivity of keyword, and y is the sensitivity of keyword Degree, u are participle weight;
Regular weight calculation formula:G*h=k, in, g is the industrial characteristic expressivity of rule, and h is the Sentiment orientation of rule Degree, k are regular weight.
The formula for calculating the comprehensive grading is:
a1*b1+…ai*bj=M
Wherein a is weight, and b is candidate's rank, and i is i-th of weight, and j is j-th candidates rank, and M is comprehensive grading.
Invented more than, the advantage of the invention is that:
By comprehensive grading sequence, just energy real-time and precise captures related article, improves real-time and standard that industry is vertically retrieved True property;By the continuous study to user's use habit, comprehensive grading ranking will increasingly understand the hobby of user, Even accomplish to become more apparent upon the demand of oneself than user, such push article, user can be allowed to only focus on his institute The content of concern, so as to improve the utilization ratio of fragmentation time;The present invention need not establish emotion dictionary, real-time various dimensions crawl Data, and by establishing information source matrix, participle matrix, regular matrix, and three matrixes are associated and reach dynamically flat Weighing apparatus, can accurately search the word to be inquired about of user, rate of accuracy reached to more than 95%.
Brief description of the drawings
Fig. 1 is overview flow chart of the present invention;
Fig. 2 is information source matrix embodiment figure of the present invention;
Fig. 3 is present invention participle matrix embodiment figure;
Fig. 4 is that regular matrix of the present invention implements illustration.
Wherein reference is:
Step 101/102/103/104.
Embodiment
It is an object of the invention to provide it is a kind of based on matrix computations carry out public opinion information extraction method and system, This method comprises the following steps, as shown in Figure 1:
Step 101, as shown in figure 1, the site information of crawl internet, establishes information source matrix, wherein information source matrix Including information base type information, information base information, site information, substation point information, information source statistical information, information bank authority letter Breath, primary attribute information, the part of article field information eight.
Described information storehouse type information is the division to the classification of big storehouse, to distinguish different field (such as government affairs, commercial affairs), Storehouse type is defined by keeper, added, while definable such information base data structure, information source association attributes and dependency number Linked according to storehouse server;
Described information storehouse information is the division classified to information source in same field, and the division in storehouse can be by the level of information source Not, the mode classification such as big trade classification, defined, added by keeper, the access right of information source is by this classification control;
The site information refers to the website belonging to the information source to be captured, such as:Sina, Netease etc.;
The substation point information refers to the specific list page address to be captured.Middle increase substation point connects in substation point information After being grounded location, classifications at different levels belonging to it, and configured list page, final page crawl label are set;By information processing after crawl article Program is that article sets respective attributes automatically according to the attribute of the affiliated substation point of article;
Each website, substation point information scratching situation can be monitored in the statistical information of described information source in real time:Information is captured Whether bar number, newest crawl time, seized condition are normal etc., and can be by editor's statistical correlation workload;
Described information storehouse authority information can control operation of the editor to each information bank, only distribute the information bank of authority It is just visible to editor and can increase, delete, change information source;
The primary attribute information is that the various relevant rudimentary attributes of information source are safeguarded, is wrapped in primary attribute information Include attribute type information, attribute information, information source type information;Without redeveloping or changing number when increasing information source classifying rules According to structure, directly added in primary attribute management.
The article field information is the definition to crawl article available fields, can be from text during the data structure of configuration information storehouse Available fields are chosen in chapter field list.
Step 102, as shown in Fig. 2 establishing participle matrix, wherein (can also be ground by industry of the keyword where possible Domain, the i.e. upper vocabulary comprising the keyword) pass through as the element for segmenting matrix, user when inputting keyword The affiliated industry of the keyword is selected, reduces range of search, improves efficiency;
Step 103, as shown in figure 3, establishing regular matrix, wherein searching the keyword for best embodying article content, website In article carry out keyword mark, such as article " there is thunder shower in Beijing some areas ", wherein " Beijing ", " thunder shower " is should The mark keyword of article;
Step 104, user inputs keyword, selects the affiliated industry of user's keyword by segmenting matrix, and calculate The weight of each website and keyword in candidate's rank, the candidate's rank for segmenting matrix rows industry, regular matrix in information source matrix Candidate's rank of mark, wherein weight meet:Enterprise web site 28%, industrial sustainability 22%, central office website 35%, finance and economics net Stand 27%, local items 2%;Candidate's rank:Forwarding 0.5, collection 0.4,0.3 is thumbed up, more than 20 times 0.2 is read, reads 20 times Below 0.1, comprehensive grading is calculated according to data above, the comprehensive grading highest article or word, for the result to be searched, Calculate comprehensive grading formula be:
a1*b1+…ai*bj=M
Wherein a is weight, and b is candidate's rank, and i is i-th of weight, and j is j-th candidates rank, and M is comprehensive grading.
The present invention also proposes a kind of system that public opinion information extraction is carried out based on matrix computations, including
Information source matrix module is established, for capturing the site information of internet, establishes information source matrix, wherein information source Matrix includes information base type information, information base information, site information, substation point information, information source statistical information, information bank power Limit information, primary attribute information, the part of article field information eight;
Participle matrix, regular matrix module are established, for establishing participle matrix, regular matrix, obtains described information respectively The weight and candidate's rank of source matrix and the participle matrix and the regular matrix, user inputs user's keyword, by dividing Word matrix selects the affiliated industry of user's keyword, and according to the weight and candidate's level calculation comprehensive grading, To complete the analysis of public opinion.
Element of the participle matrix using the possible affiliated industry of user's keyword as the participle matrix, it is described User by industry belonging to selection, reduces range of search, improves efficiency when inputting user's keyword.
The regular matrix includes searching the keyword for best embodying article content in website, carries out keyword mark.
The weight is obtained by below equation:
Information source weight calculation formula:Q*w*e=r
Wherein, q be information source type scoring, w be information source website importance be classified, e be whether top set.
For example q is news website, 10 points are set to;W is national emphasis portal website (Sina), is 10 points;E is website Top news, top set, 10 points.
Segment weight calculation formula:X*y=u
Wherein, x is the industrial characteristic expressivity of keyword, and y is the susceptibility of keyword.
For example nuclear radiation is 5 in the feature representation degree of environmental protection industry (epi), gasoline is 1 in the feature representation degree of environmental protection industry (epi), core The susceptibility of radiation can be very high, and gasoline is then than relatively low.
Regular weight calculation formula:G*h=k
Wherein, g is the industrial characteristic expressivity of rule, and h is the Sentiment orientation degree of rule.
The formula for calculating the comprehensive grading is:
a1*b1+…ai*bj=M
Wherein a is weight, and b is candidate's rank, and i is i-th of weight, and j is j-th candidates rank, and M is comprehensive grading.

Claims (6)

  1. A kind of 1. method that public opinion information extraction is carried out based on matrix computations, it is characterised in that including:
    Step 1, the site information of internet is captured, establishes information source matrix, wherein information source matrix is believed including information base type Breath, information base information, site information, substation point information, information source statistical information, information bank authority information, primary attribute information, The part of article field information eight;
    Step 2, participle matrix, regular matrix are established, obtains described information source matrix and the participle matrix and the rule respectively Then the weight of matrix and candidate's rank, user input user's keyword, and the institute of user's keyword is selected by segmenting matrix Belong to industry, and according to the weight and candidate's level calculation comprehensive grading, to complete the analysis of public opinion;
    Wherein, the regular matrix includes, and searches the keyword that article content is best embodied in website, carries out keyword mark;
    And the weight is obtained by below equation:
    Information source weight calculation formula:Q*w*e=r, wherein q score for information source type, and w is information source website importance point Level, e for whether top set, r is information source weight;
    Segment weight calculation formula:X*y=u, wherein x be keyword industrial characteristic expressivity, y be keyword susceptibility, u To segment weight;
    Regular weight calculation formula:G*h=k, wherein g are the industrial characteristic expressivity of rule, and h is the Sentiment orientation journey of rule Degree, k is regular weight.
  2. A kind of 2. method that public opinion information extraction is carried out based on matrix computations as claimed in claim 1, it is characterised in that The participle matrix is using the possible affiliated industry of user's keyword as the element of the participle matrix, and the user is defeated When entering user's keyword, by industry belonging to selection, range of search is reduced, improves efficiency.
  3. A kind of 3. method that public opinion information extraction is carried out based on matrix computations as claimed in claim 1, it is characterised in that The formula for calculating the comprehensive grading is:
    a1*b1+…ai*bj=M
    Wherein a is weight, and b is candidate's rank, and i is i-th of weight, and j is j-th candidates rank, and M is comprehensive grading.
  4. A kind of 4. system that public opinion information extraction is carried out based on matrix computations, it is characterised in that including:
    Information source matrix module is established, for capturing the site information of internet, establishes information source matrix, wherein information source matrix Including information base type information, information base information, site information, substation point information, information source statistical information, information bank authority letter Breath, primary attribute information, the part of article field information eight;
    Participle matrix, regular matrix module are established, for establishing participle matrix, regular matrix, obtains described information source square respectively Battle array and the weight and candidate's rank of the participle matrix and the regular matrix, user inputs user's keyword, by segmenting square Battle array selects the affiliated industry of user's keyword, and according to the weight and candidate's level calculation comprehensive grading, with complete Into the analysis of public opinion;
    Wherein described regular matrix includes, and searches the keyword that article content is best embodied in website, carries out keyword mark, and The weight is obtained by below equation:
    Information source weight calculation formula:Q*w*e=r, wherein q score for information source type, and w is information source website importance point Level, e for whether top set, r is information source weight;
    Segment weight calculation formula:X*y=u, wherein x be keyword industrial characteristic expressivity, y be keyword susceptibility, u To segment weight;
    Regular weight calculation formula:G*h=k, wherein g are the industrial characteristic expressivity of rule, and h is the Sentiment orientation journey of rule Degree, k is regular weight.
  5. A kind of 5. system that public opinion information extraction is carried out based on matrix computations as claimed in claim 4, it is characterised in that The participle matrix is using the possible affiliated industry of user's keyword as the element of the participle matrix, and the user is defeated When entering user's keyword, by industry belonging to selection, range of search is reduced, improves efficiency.
  6. A kind of 6. system that public opinion information extraction is carried out based on matrix computations as claimed in claim 4, it is characterised in that The formula for calculating the comprehensive grading is:
    a1*b1+…ai*bj=M
    Wherein a is weight, and b is candidate's rank, and i is i-th of weight, and j is j-th candidates rank, and M is comprehensive grading.
CN201510569894.0A 2015-09-09 2015-09-09 A kind of method and system that public opinion information extraction is carried out based on matrix computations Active CN105117385B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201510569894.0A CN105117385B (en) 2015-09-09 2015-09-09 A kind of method and system that public opinion information extraction is carried out based on matrix computations

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201510569894.0A CN105117385B (en) 2015-09-09 2015-09-09 A kind of method and system that public opinion information extraction is carried out based on matrix computations

Publications (2)

Publication Number Publication Date
CN105117385A CN105117385A (en) 2015-12-02
CN105117385B true CN105117385B (en) 2017-12-19

Family

ID=54665379

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201510569894.0A Active CN105117385B (en) 2015-09-09 2015-09-09 A kind of method and system that public opinion information extraction is carried out based on matrix computations

Country Status (1)

Country Link
CN (1) CN105117385B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110866106A (en) * 2019-10-10 2020-03-06 重庆金融资产交易所有限责任公司 Text recommendation method and related equipment
CN117112609B (en) * 2023-06-29 2024-05-10 南京国电南自轨道交通工程有限公司 Method for improving retrieval efficiency of monitoring historical data by using key element matrix

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102314489A (en) * 2011-08-15 2012-01-11 哈尔滨工业大学 Method for analyzing opinion leader in network forum
US8229729B2 (en) * 2008-03-25 2012-07-24 International Business Machines Corporation Machine translation in continuous space
CN102982153A (en) * 2012-11-29 2013-03-20 北京亿赞普网络技术有限公司 Information retrieval method and device
CN103455613A (en) * 2013-09-06 2013-12-18 南京大学 Interest aware service recommendation method based on MapReduce model
CN104731812A (en) * 2013-12-23 2015-06-24 北京华易互动科技有限公司 Text emotion tendency recognition based public opinion detection method

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8229729B2 (en) * 2008-03-25 2012-07-24 International Business Machines Corporation Machine translation in continuous space
CN102314489A (en) * 2011-08-15 2012-01-11 哈尔滨工业大学 Method for analyzing opinion leader in network forum
CN102982153A (en) * 2012-11-29 2013-03-20 北京亿赞普网络技术有限公司 Information retrieval method and device
CN103455613A (en) * 2013-09-06 2013-12-18 南京大学 Interest aware service recommendation method based on MapReduce model
CN104731812A (en) * 2013-12-23 2015-06-24 北京华易互动科技有限公司 Text emotion tendency recognition based public opinion detection method

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
基于海量舆情信息的话题检测系统的设计与实现;王树辰;《中国优秀硕士学位论文全文数据库 信息科技辑》;20140415;正文全文 *

Also Published As

Publication number Publication date
CN105117385A (en) 2015-12-02

Similar Documents

Publication Publication Date Title
CN109325165B (en) Network public opinion analysis method, device and storage medium
CN104537097B (en) Microblogging public sentiment monitoring system
CN103744981B (en) System for automatic classification analysis for website based on website content
WO2019227710A1 (en) Network public opinion analysis method and apparatus, and computer-readable storage medium
CN109145216A (en) Network public-opinion monitoring method, device and storage medium
CN103399891A (en) Method, device and system for automatic recommendation of network content
US8965867B2 (en) Measuring and altering topic influence on edited and unedited media
CN101819573A (en) Self-adaptive network public opinion identification method
CN108932291B (en) Power grid public opinion evaluation method, storage medium and computer
CN104408191A (en) Method and device for obtaining correlated keywords of keywords
CN104794161A (en) Method for monitoring network public opinions
CN104899335A (en) Method for performing sentiment classification on network public sentiment of information
Dang et al. Framework for retrieving relevant contents related to fashion from online social network data
US11789946B2 (en) Answer facts from structured content
CN104217038A (en) Knowledge network building method for financial news
CN105787662A (en) Mobile application software performance prediction method based on attributes
CN104615723B (en) The determination method and apparatus of query word weighted value
CN103853746A (en) Word bank generation method and system, input method and input system
CN109885656A (en) Microblogging forwarding prediction technique and device based on quantization temperature
CN113282754A (en) Public opinion detection method, device, equipment and storage medium for news events
CN103218368A (en) Method and device for discovering hot words
CN102591977A (en) Method and system for sequencing search results
US20170235835A1 (en) Information identification and extraction
CN105117385B (en) A kind of method and system that public opinion information extraction is carried out based on matrix computations
CN108595466B (en) Internet information filtering and internet user information and network card structure analysis method

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
PP01 Preservation of patent right

Effective date of registration: 20210303

Granted publication date: 20171219

PP01 Preservation of patent right
PD01 Discharge of preservation of patent

Date of cancellation: 20240303

Granted publication date: 20171219

PD01 Discharge of preservation of patent