CN102609427A - Public opinion vertical search analysis system and method - Google Patents
Public opinion vertical search analysis system and method Download PDFInfo
- Publication number
- CN102609427A CN102609427A CN2011103549731A CN201110354973A CN102609427A CN 102609427 A CN102609427 A CN 102609427A CN 2011103549731 A CN2011103549731 A CN 2011103549731A CN 201110354973 A CN201110354973 A CN 201110354973A CN 102609427 A CN102609427 A CN 102609427A
- Authority
- CN
- China
- Prior art keywords
- text
- sensibility
- vocabulary
- url
- weights
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Images
Landscapes
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention relates to network information processing technology, and discloses a public opinion vertical search analysis system. The system for text-based network public opinion search analysis comprises a vertical search engine crawler module, a template-based information extraction module, a text orientation analysis module based on phrase extraction, a text orientation analysis module based on vocabulary statistical model. In comparison with the prior art, the accuracy of the information emotion orientation algorithm based on phrase model and the vocabulary statistical model is improved by about 5%, and the algorithm has remarkable improvement effect. Meanwhile, the execution efficiency of processing is improved by designing a multi-threading method, thereby realizing quicker and more accurate public opinion search analysis effect.
Description
Technical field
The present invention relates to network information treatment technology, particularly relate to a kind of network public-opinion search and analytic system and method.
Background technology
Major technique involved in the present invention comprises:
1. with the relevant guardian technique of network public-opinion monitoring
(1) network public-opinion collection and extractive technique: network public-opinion mainly forms and propagates through channels such as news, forum/BBS, blog, instant communication softwares; The supporting body of these passages is mainly dynamic web page; They are carrying loose structured message, make that effective extraction of public feelings information is very difficult.Method through full-automatic generation Web page information extraction Wrapper has realized the extraction of dynamic web page data and integrated to a certain extent, has certain processing accuracy rate and extraction efficiency.
(2) the network public-opinion topic is found and tracer technique: the topic that the netizen discusses is various, contains social every aspect, how from magnanimity information, to find focus, sensitive subjects, and its trend changed to follow the trail of becomes the research focus.
(3) network public-opinion based on sentiment classification technology: subjective reflections such as the emotion that can the definite network blazer contains through based on sentiment classification, attitude, viewpoint, position, intention.The public sentiment text is carried out based on sentiment classification, in fact is exactly the target of attempting to extract according to the content of text with computer realization text author's emotion direction.
(4) many documents automatic abstract technology: the pages such as news, model, blog article are all comprising junk information, and the multi-document auto-abstracting technology can be filtered content of pages, and refines into summary info, are convenient to inquiry and retrieval.
2. information extraction technique
The vertical search engine realization flow is that spider climbs and gets webpage; To webpage classify, information extraction; The unstructured data that is about to webpage extracts into the specific structure data, and these data storage to database, are carried out further processed; As go weight, analysis relatively to wait, through a minute glossarial index user search is provided at last.In the above-mentioned flow process most critical exactly unstructured data is extracted into structural data according to demand, this also is the maximum difference of vertical search engine and universal search engine.
Mainly contain dual mode implementation structure information extraction at present:
(1) structured message based on the web page library level extracts mode
Adopt the method for page structure analysis and intelligent node analysis conversion, the Automatic Extraction structural data.This mode can extract normal webpage arbitrarily, full automation, and it is high that intelligence extracts accuracy rate.But owing to need versatility good, its technology realizes that difficulty is higher, and R&D costs are high in earlier stage, the cycle is long, only is fit to high-end applications.
(2) template way
Template way is in advance the structure of web page of data source to be analyzed, and to various structure, carries out template matches.The specific regular expression of utilization carries out exact acquisition to the information of limited website in extraction template.The implementation procedure of this mode is comparatively simple and easy, to the structure of web page of data source, and configuration template like a cork, high, real-time, the convenient and swift deployment of accuracy rate.But maintenance is huge under information source diversity and unsettled situation, so this mode is fit to the information processing in relatively-stationary limited information source.
3. based on the text tendentiousness research method of semanteme
At present, the text tendentiousness research method based on semanteme mainly contains two kinds.
(1) first kind is that elder generation extracts the phrase that the adjective in the text to be analyzed maybe can embody subjective colo(u)r; Then adjective or the phrase that extracts carried out the tendentiousness judgement one by one and give a propensity value, above-mentioned all propensity value are added up obtain the overall text tendentiousness of article at last.That is:
1) utilizes the linguistics that connects adjectival conjunction to retrain and judge whether the emotion of two adjectives expression that connected is consistent, obtain to represent two adjectivals of emotion tendency then with the method for birdsing of the same feather flock together.People such as Turney use PMI_IR (Pointwise Mutual Information and Information Retrieval) method to estimate the similarity of phrase and the benchmark speech (like " good " and " bad ") of two positions of expression emotion, and similarity is calculated and used the pointwise mutual information.The tendentiousness of grammatical term for the character also has class methods to be based on an existing ontology knowledge storehouse, like the WordNet of English and the HowNet of Chinese, calculates and waits to estimate speech and the selected right semantic distance of benchmark speech, and then judge the tendentiousness of waiting to estimate speech.
2) utilize the language similarity that HowNet provides and the computing function of semantic dependent field, calculate wait to estimate speech with choose in advance pass judgement on the correlativity of benchmark speech to group, thereby obtain the tendentiousness of this speech.
(2) second kinds based on the tendentious research method of the text of semanteme: set up a tendentiousness semantic pattern storehouse in advance, also can attach a tendentiousness dictionary sometimes.To wait that then estimating document does pattern match with reference to the semantic pattern storehouse, thereby the corresponding tendentiousness value of all match patterns that adds up at last obtains the tendentiousness of entire document.People such as Liu Yongdan are used for tendentiousness with existing semantic analysis technology and judge, express the semantic relation in the text and carry out based on sentiment classification with case grammar of simplifying and semantic frame.And the analytical approach that people such as Zheng Yu have adopted tendentiousness dictionary and semantic rules coupling to combine is carried out the tendentiousness text filtering.
Summary of the invention
Based on above-mentioned prior art; The present invention proposes a kind of public sentiment vertical search analytic system and method; Under the web2.0 network environment; Realized handling and based on the analyzing and processing of the text semantic tendentiousness information emotion tendency of phrase pattern and vocabulary statistical model (particularly based on) with climbing based on the webpage of webpage topological sum keyword filtration algorithm to get, to realize fast and the public sentiment vertical search analysis that has more the degree of depth based on the BFS strategy.
The present invention proposes a kind of public sentiment vertical search analytic system; This system applies is in text based network public-opinion searching analysis; This system comprises vertical search engine reptile module, based on the information extraction module of template, based on the text based on sentiment classification module of phrase extraction, based on the text based on sentiment classification module of vocabulary statistical model, wherein:
Vertical search engine reptile module is utilized the webpage of filtering technique and the BFS of reptile algorithm through topological sum web page contents key word Network Based to climb and is got, and selectively searches for and internet web page that download is relevant with the public sentiment theme;
Based on the information extraction module of template, from webpage source code information, extract structurized data, and store in the database with required fixed form;
Text based on sentiment classification module based on phrase extraction obtains structured message based on the phrase extraction pattern, and respectively the structured message corpus of text is carried out based on sentiment classification, obtains the final tendency degree Sensibility (Text) of corpus of text; The processing of this module comprises:
The emotion tendency weights of vocabulary A and vocabulary B are designated as Sensibility (A) or Sensibility (B);
Judge whether vocabulary A and vocabulary B are present in " degree adverb " and reach in " negative adverb " vocabulary:
If vocabulary A and vocabulary B all do not exist, then the emotion of this phrase tendency weights do
Sensibility(A+B)=Sensibility(A)+Sensibility(B);
If vocabulary A is present in " negative adverb " vocabulary, then the phrase centre word is vocabulary B, and the emotion weights that calculate vocabulary B are Sensibility (B), then the emotion weights Sensibility (A+B) of this phrase=(1) * Sensibility (B);
Otherwise if vocabulary B is present in " negative adverb " vocabulary, then this phrase centre word is vocabulary A, the emotion weights Sensibility (A+B) of this phrase=(1) * Sensibility (A);
If vocabulary A is present in " degree adverb " vocabulary, then the phrase centre word is vocabulary B, with the degree multiple of level (A) expression as the vocabulary A of degree adverb, the emotion weights of this phrase
Sensibility(A+B)=level(A)×Sensibility(B);
Otherwise, with the degree multiple of level (B) expression as the vocabulary B of degree adverb, the emotion weights of this phrase
Sensibility(A+B)=level(B)×Sensibility(A);
The phrase weights that calculate all commendations tendency and derogatory term tendency respectively with, represent the phrase weights that have commendation tendency and derogatory term to be inclined to respectively with Positive (words) and Negative (words):
With genitive phrase emotion weights summations, the gained result is less than 0 phrase weights as the derogatory term tendency
With genitive phrase emotion weights summations, the gained result more than or equal to 0 as commendatory term tendency and neutral phrase weights
The final tendency degree of corpus of text is with Sensibility (Text) expression, then
If Sensibility (Text)<0 representes that then the text is a derogatory sense tendency text; If Sensibility (Text)>=0 expression text is commendation tendency or neutral text;
Based on the text based on sentiment classification module of vocabulary statistical model, accomplish the information source and the analysis of negative tendency property of system, obtain text Text emotion tendency value, the concrete processing of this module comprises:
Read in text Text, text Text is carried out subordinate sentence by punctuate, be labeled as S1, S2 ... Sn;
All have the attitude speech of explicit semantic meaning tendency search S1; The part of speech of the attitude speech of being searched for here is adjective, adverbial word, noun, verb and Chinese idiom etc.; Utilize vocabulary emotion computing module to calculate each attitude speech emotion weights; And the weights of all attitude speech among the S1 are superposeed, obtain all attitude speech weights summation V1 of this subordinate sentence;
All are included in the degree speech quantity in the degree adverb dictionary search S1, when comprising the degree speech, attitude weights V1 multiply by the degree multiple level () of degree adverb in the degree dictionary, i.e. level () * V1;
S1 calculates and finishes, and next subordinate sentence S2 of search Text repeats three steps in front, calculates all attitude speech weights summation V2 of this subordinate sentence S2;
Behind all attitude speech weights summation Vn that calculate last subordinate sentence, calculate positive Vi weights summation Positive (Sentences) respectively, with negative Vi weights summation Negative (Sentences)
Calculating final text tendency degree at last is:
Said webpage is climbed and is got strategy and comprise depth-first search strategy, BFS strategy, best-first search strategy.
Said webpage is climbed to get and is comprised climbing and get URL, climb and get the webpage degree of depth, climb and get webpage number, url filtering four attributes in being identified at;
Said vertical search engine reptile module, it is climbed extract operation and adopts multithreading, comprises following processing:
After certain thread is accomplished page download; The page of downloading is committed to parsing buffer zone thread pool with the link network address form that comprises in the webpage that extracts, joins in the buffer queue to be downloaded, thread pool invoke resolver analyzing web page extracts URL; And join the URL that parsing obtains in the URL record: URL is a tree structure in network; In tree structure, the URL of different level nodes maybe be identical, because same URL can be by parsing in other a lot of webpages.In reptile design, comprise the URL that record parses, earlier URL is judged before URL is write down, if during the URL that parses Already in writes down, then skip this URL, otherwise can join URL in the record that is untreated.When reptile is searched for; It at first handles initial URL, after resolver resolves, will obtain new one deck URL formation; Next reptile is downloaded these URL according to the default sequence of URL in formation; And analyze, resolving the URL that makes new advances, the URL after the processing puts into and handles record queue.Only all webpages of current level climb get completion after, just can climb and get the URL of next level;
In the text based on sentiment classification module of said vocabulary statistical model, if search subordinate sentence S1 all be included in the negative word quantity in the negative adverb table, when the negative word number is odd number, with S1 belong to subordinate sentence attitude weights V1 be converted into-(1-t) * V1; Wherein t is a fuzzy value, and is relevant in degree adverb position successively with negative word, and greater than zero, negative word less than zero, is established fuzzy value at back fuzzy value between 0.2~0.4 to negative word at preceding fuzzy value.
The present invention also proposes a kind of public sentiment vertical search analytical approach, and this method may further comprise the steps:
Call vertical search engine reptile module, utilize the webpage of filtering technique and the BFS of reptile algorithm through topological sum web page contents key word Network Based to climb and get, selectively search for and internet web page that download is relevant with the public sentiment theme;
Information extraction module through based on template extracts structurized data from webpage source code information, and stores in the database with required fixed form;
Realize two kinds of algorithms through text based on sentiment classification module: based on the phrase extraction pattern with based on the vocabulary statistical model, and respectively the structured message text is carried out based on sentiment classification, obtain text emotion tendency weights;
Obtain structured message based on the phrase extraction pattern, respectively the structured message corpus of text carried out based on sentiment classification, obtain the final tendency degree Sensibility (Text) of corpus of text:
The emotion tendency weights of vocabulary A and vocabulary B are designated as Sensibility (A) or Sensibility (B);
Judge whether vocabulary A and vocabulary B are present in " degree adverb " and reach in " negative adverb " vocabulary:
If vocabulary A and vocabulary B all do not exist, then the emotion of this phrase tendency weights do
Sensibility(A+B)=Sensibility(A)+Sensibility(B);
If vocabulary A is present in " negative adverb " vocabulary, then the phrase centre word is vocabulary B, and the emotion weights that calculate vocabulary B are Sensibility (B), then the emotion weights Sensibility (A+B) of this phrase=(1) * Sensibility (B);
Otherwise if vocabulary B is present in " negative adverb " vocabulary, then this phrase centre word is vocabulary A, the emotion weights Sensibility (A+B) of this phrase=(1) * Sensibility (A);
If vocabulary A is present in " degree adverb " vocabulary, then the phrase centre word is vocabulary B, with the degree multiple of level (A) expression as the vocabulary A of degree adverb, the emotion weights of this phrase
Sensibility(A+B)=level(A)×Sensibility(B);
Otherwise, with the degree multiple of level (B) expression as the vocabulary B of degree adverb, the emotion weights of this phrase
Sensibility(A+B)=level(B)×Sensibility(A);
The phrase weights that calculate all commendations tendency and derogatory term tendency respectively with, represent the phrase weights that have commendation tendency and derogatory term to be inclined to respectively with Positive (words) and Negative (words):
With genitive phrase emotion weights summations, the gained result is less than 0 phrase weights as the derogatory term tendency
With genitive phrase emotion weights summations, the gained result more than or equal to 0 as commendatory term tendency and neutral phrase weights
The final tendency degree of corpus of text is with Sensibility (Text) expression, then
If Sensibility (Text)<0 representes that then the text is a derogatory sense tendency text; If Sensibility (Texy)>=0 expression text is commendation tendency or neutral text;
Based on the text based on sentiment classification of vocabulary statistical model, accomplish the information source and the analysis of negative tendency property of system, obtain the emotion tendency value of text Text:
Read in text Text, text Text is carried out subordinate sentence by punctuate, be labeled as S1, S2 ... Sn;
All have the attitude speech of explicit semantic meaning tendency search S1; The part of speech of the attitude speech of being searched for here is adjective, adverbial word, noun, verb and Chinese idiom etc.; Utilize vocabulary emotion computing module to calculate each attitude speech emotion weights; And the weights of all attitude speech among the S1 are superposeed, obtain all attitude speech weights summation V1 of this subordinate sentence;
All are included in the degree speech quantity in the degree adverb dictionary search S1, when comprising the degree speech, attitude weights V1 multiply by the degree multiple level () of degree adverb in the degree dictionary, i.e. level () * V1;
S1 calculates and finishes, and next subordinate sentence S2 of search Text repeats three steps in front, calculates all attitude speech weights summation V2 of this subordinate sentence S2;
Behind all attitude speech weights summation Vn that calculate last subordinate sentence, calculate positive Vi weights summation Positive (Sentences) respectively, with negative Vi weights summation Negative (Sentences).
Calculating final text tendency degree at last is:
Said webpage is climbed and is got strategy and comprise depth-first search strategy, BFS strategy, best-first search strategy.
Said webpage is climbed to get and is comprised climbing and get URL, climb and get the webpage degree of depth, climb and get webpage number, url filtering four attributes in being identified at.
Said vertical search engine reptile module, it is climbed extract operation and adopts multithreading, comprises following processing:
After certain thread is accomplished page download; The page of downloading is committed to parsing buffer zone thread pool with the link network address form that comprises in the webpage that extracts, joins in the buffer queue to be downloaded, thread pool invoke resolver analyzing web page extracts URL; And join the URL that parsing obtains in the URL record: URL is a tree structure in network; In tree structure, the URL of different level nodes maybe be identical, because same URL can be by parsing in other a lot of webpages.In reptile design, comprise the URL that record parses, earlier URL is judged before URL is write down, if during the URL that parses Already in writes down, then skip this URL, otherwise can join URL in the record that is untreated.When reptile is searched for; It at first handles initial URL, after resolver resolves, will obtain new one deck URL formation; Next reptile is downloaded these URL according to the default sequence of URL in formation; And analyze, resolving the URL that makes new advances, the URL after the processing puts into and handles record queue.Only all webpages of current level climb get completion after, just can climb and get the URL of next level;
Said if search certain subordinate sentence S
nAll are included in the negative word quantity in the negative adverb table, when the negative word number is odd number, S1 are belonged to subordinate sentence attitude weights V
nBe converted into-(1-t) * V
nWherein t is a fuzzy value, and is relevant in degree adverb position successively with negative word, and greater than zero, fuzzy value is provided with fuzzy value between 0.2~0.4 less than zero to negative word to negative word in the back at preceding fuzzy value.
Compared with prior art, the algorithm accuracy rate based on the information emotion tendency of phrase pattern and vocabulary statistical model that the present invention adopts contrasts than prior art, has improved about 5 percentage points the apparent in view step of the improved effect of algorithm; Simultaneously, the method for multithreading design has improved the execution efficient of handling, therefore, reached for the public sentiment searching analysis faster, searching analysis effect more accurately.
Description of drawings
Fig. 1 is a vertical search analytic system module map of the present invention;
Fig. 2 climbs the process flow diagram of delivery piece for vertical search of the present invention;
Fig. 3 is for comprising the web interface synoptic diagram of noise information before handling based on the templating of the information extraction module of template in the specific embodiment of the invention;
Fig. 4 is that template based on the information extraction module of template extracts back formatted message interface as a result in the specific embodiment of the present invention;
Fig. 5 is the text based on sentiment classification module process flow diagram of the present invention's invention;
Fig. 6 is the text based on sentiment classification algorithm text identification number comparison diagram based on phrase and vocabulary statistics, (sample size is respectively 1000,2000 and 3000 texts);
Fig. 7 is the text based on sentiment classification algorithm text identification accuracy rate comparison diagram (sample size is the same) based on phrase and vocabulary statistics.
Embodiment
The present invention is a background with the public sentiment monitoring, handles below main the completion:
The first step is called vertical search engine reptile module, utilizes the reptile algorithm to get strategy through the filtering technique and the climbing of BFS of topological sum web page contents key word Network Based, selectively searches for and internet web page that download is relevant with the public sentiment theme;
In this step, the design that the present invention adopts multithreading to download, climb extract operation and comprise following processing: web pages downloaded is analyzed:
After certain thread is accomplished page download; The page of downloading is committed to parsing buffer zone thread pool with the link network address form that comprises in the webpage that extracts, joins in the buffer queue to be downloaded, thread pool invoke resolver analyzing web page extracts URL; And join the URL that parsing obtains in the URL record: URL is a tree structure in network; In tree structure, the URL of different level nodes maybe be identical, because same URL can be by parsing in other a lot of webpages.In reptile design, comprise the URL that record parses, earlier URL is judged before URL is write down, if during the URL that parses Already in writes down, then skip this URL, otherwise can join URL in the record that is untreated.When reptile is searched for; It at first handles initial URL, after resolver resolves, will obtain new one deck URL formation; Next reptile is downloaded these URL according to the default sequence of URL in formation; And analyze, resolving the URL that makes new advances, the URL after the processing puts into and handles record queue.Only all webpages of current level climb get completion after, just can climb and get the URL of next level.
Webpage involved in the present invention is climbed and is got strategy and have: depth-first search strategy, BFS strategy, best-first search strategy.
This public sentiment vertical search system needs to gather and information extraction comprising that website public feelings informations such as all kinds of news and forum climb to get; In the link analysis process; Because two types of websites of news and forum are different network topology structure; The web page contents of news website generally can show in a URL page fully, and netizen's number of leaving a message is less, even the more page of leaving a message also all shows in a URL page; Just on web length, increase; And unlike the link label that exists " following one page " in the forum, therefore here can be the definition of the link analysis in the news website " the webpage degree of depth " notion, promptly all hyperlink in the current page URL source code are next degree of depth page of current page.
In second step, from webpage source code information, extract structurized data through information extraction module, and store in the database with required fixed form based on template;
The required data layout of public sentiment vertical search engine system of the present invention is respectively: time, number of reviews, forwarding quantity are delivered in the URL of opinion article, opinion article source, public opinion theme (title), public opinion author, public opinion text, public opinion.The present invention adopts template way to carry out information extraction, so template style is definite particularly important.After template was confirmed, crawlers must obtain top all listed data layouts after according to template matches.
For crawlers itself, comprise that initial climbing get URL, climb and get the webpage degree of depth, climb that to get webpage number, url filtering four attributes in being identified at also be that reptile is necessary.So the integrated structure information format, native system becomes data layout as shown in the table with template definition.
Sequence number | Identifier | |
1 | style | The Type of website, both news category or forum's class |
2 | authorstart | " author " sign beginning label |
3 | authorend | " author " identifies end-tag |
4 | contentstart | " content " sign beginning label |
5 | contentend | " content " sign end-tag |
6 | source | " source " identification (RFID) tag |
7 | timestart | " time " sign beginning label |
8 | timeend | " time " sign end-tag |
9 | url | " initial URL " identification (RFID) tag |
10 | ex_url | " url filtering " identification (RFID) tag |
11 | count(deep) | Climb and get the degree of depth or climb and get the webpage number |
12 | keyword | Filter keyword |
In the 3rd step, realize two kinds of algorithms through text based on sentiment classification module: based on the phrase extraction pattern with based on the vocabulary statistical model, and respectively the structured message text is carried out based on sentiment classification, obtain the text emotion weights;
Based on the tendentious specific algorithm of text Text of phrase extraction pattern, phrase is represented with A+B:
A) judge respectively whether vocabulary A and vocabulary B are present in " degree adverb " and reach in " negative adverb " table, if, be designated as Sensibility (A) or Sensibility (B) all not then calculating its emotion tendency weights through vocabulary emotion computing module;
The emotion weights of this phrase then
Sensibility(A+B)=Sensibility(A)+Sensibility(B);
B) if vocabulary A is present in " negative adverb " vocabulary, then the phrase centre word is vocabulary B, and the emotion weights that calculate vocabulary B are Sensibility (B), then the emotion weights of this phrase
Sensibility+(A+B)=(-1)×Sensibility(B);
Otherwise if vocabulary B is present in " negative adverb " vocabulary, then this phrase centre word is vocabulary A, the emotion weights of this phrase
Sensibility(A+B)=(-1)×Sensibility(A);
C) if vocabulary A is present in " degree adverb " vocabulary, then the phrase centre word is vocabulary B, with the degree multiple of level (A) expression as the vocabulary A of degree adverb, the emotion weights of this phrase
Sensibility(A+B)=level(A)×Sensibility(B);
Otherwise, with the degree multiple of level (B) expression as the vocabulary B of degree adverb, the emotion weights of this phrase
Sensibility(A+B)=level(B)×Sensibility(A);
D) the phrase weights that calculate all commendations tendency and derogatory term tendency respectively with, represent the phrase weights that have commendation tendency and derogatory term to be inclined to respectively with Positive (words) and Negative (wirds)
1) with genitive phrase emotion weights summations, the gained result is less than 0 phrase weights as the derogatory term tendency
2) with genitive phrase emotion weights summations, the gained result more than or equal to 0 as commendatory term tendency and neutral phrase weights
E) the final tendency degree of corpus of text is with Sensibility (Text) expression, then
If Sensibility (Text)<0 representes that then the text is a derogatory sense tendency text; If Sensibility (Text)>=0 expression text is commendation tendency or neutral text.
Annotate: in linguistics, corpus (Corpus) refers to the set of a large amount of texts, and the text in the storehouse (being called language material) through arrangement, has set form and mark usually, refers in particular to the digitizing corpus of Computer Storage.
The 4th step, last, with above-mentioned search reptile algorithm, information extraction module, text based on sentiment classification module with the public sentiment net that will analyze combine the information source and the negative tendency property analytic function of completion system.Promptly realize the text based on sentiment classification processing module of vocabulary statistical model:
The text based on sentiment classification of vocabulary statistical model is handled, and its specific algorithm that calculates text Text emotion tendency value is following:
A) read in text Text, text Text is carried out subordinate sentence by punctuate, be labeled as S1, S2 ... Sn;
B) all have the attitude speech of explicit semantic meaning tendency to search for S1; The part of speech of the attitude speech of being searched for here is adjective, adverbial word, noun, verb and Chinese idiom etc.; Utilize vocabulary emotion computing module to calculate each attitude speech emotion weights; And the weights of all attitude speech among the S1 are superposeed, obtain all attitude speech weights summation V1 of this subordinate sentence;
C) all are included in the negative word quantity in the negative adverb table to search for S1.When negative word is odd number, with S1 belong to subordinate sentence attitude weights V1 be converted into-(1-t) * V1; (wherein t is a fuzzy value, and is relevant in degree adverb position successively with negative word, and greater than zero, negative word less than zero, can be established fuzzy value at back fuzzy value between 0.2~0.4 to negative word at preceding fuzzy value); When negative word is even number, then need not do foregoing processing.
D) all are included in the degree speech quantity in the degree adverb dictionary to search for S1, when comprising the degree speech, attitude weights V1 multiply by the degree multiple level () of degree adverb in the degree dictionary, i.e. level () * V1;
E) S1 calculates and to finish, and next subordinate sentence S2 of search Text repeats b), c), d) step, calculate all attitude speech weights summation V2 of this subordinate sentence;
F) behind all attitude speech weights summation Vn that calculate last subordinate sentence, calculate positive Vi weights summation Positive (Sentences) respectively, with negative Vi weights summation Negative (Sentences).
G) at last according to normalizing property principle, final text tendency degree is:
Below be specific embodiment of the present invention, to further specify technical scheme of the present invention:
1. pre-service of the present invention is divided into web crawlers and climbs the breath templating two parts of winning the confidence.
The detailed process of reptile:
A) the visit url database reads the URL entry address, generates the internal storage access formation
B) seek idle HTTP download module, distribute URL, start downloading task
C) HTTP download module access internet obtains web page contents and puts into result queue
D) be saved in web database, for follow-up index and other operations are prepared
E) the link analysis module is extracted the new url in the page, deposits url database in and waits for downloads
F) repeating said process accomplishes up to whole downloads.
Thread is responsible for URL and web page contents are filtered, and chooses required URL, and the webpage source code is analyzed, and resolves the URL that makes new advances and puts into the record tabulation also according to the data structure that defines, and deposits in the database after various information are extracted from the webpage source code.
After template file is submitted to; The information extraction program is obtained the pairing character string of each label of template file on the backstage; Filter through crawlers institute web pages downloaded source code being carried out template matches then, can obtain required formatted datas such as " content ", " time ", " title ", " source ", " comment number ", " reprinting number ".
2. the text based on sentiment classification based on phrase pattern and vocabulary statistical model compares
Text based on sentiment classification algorithm based on the vocabulary statistical model; On the extraction mode of text emotion flag sign; Remedied the leak of phrase pattern text based on sentiment classification algorithm, abandoned the extraction mode of unalterable rules, the extraction of the text emotion characteristic that well realizes with statistical mode; With the sentence be the method for text emotion unit of analysis than phrase to being that the method for unit of analysis is more pressed close to text semantic; The work of negative adverb is that unit better analyzes and realizes in order to sentence; And introduced the notion of fuzzy value, make analysis result more accurate; Subsemantic influence is also analyzed in algorithm and is summarized to the punctuation mark distich.
To the text based on sentiment classification algorithm of vocabulary statistical model, this experiment is adopted and is tested based on the identical experimental data of text based on sentiment classification algorithm of phrase pattern during last one saves, and the result is following:
Table 2, sample size are respectively 1000,2000 and 3000 texts.
Number of |
1000 | 2000 | 3000 |
Marked |
1000 | 2000 | 3000 |
Test negative text | 833 | 1685 | 2443 |
Accuracy rate | 83.300% | 84.250% | 81.433% |
Can find out according to accompanying drawing 5,6; The test result of the emotion corpus text based on sentiment classification algorithm based on the vocabulary statistical model is exceeded about 5 percentage points based on the text based on sentiment classification algorithm accuracy rate of phrase pattern; Contrast, the improved effect of algorithm is apparent in view.
Claims (10)
1. public sentiment vertical search analytic system; This system applies is in text based network public-opinion searching analysis; It is characterized in that; This system comprises vertical search engine reptile module, based on the information extraction module of template, based on the text based on sentiment classification module of phrase extraction, based on the text based on sentiment classification module of vocabulary statistical model, wherein:
Vertical search engine reptile module is utilized the webpage of filtering technique and the BFS of reptile algorithm through topological sum web page contents key word Network Based to climb and is got, and selectively searches for and internet web page that download is relevant with the public sentiment theme;
Based on the information extraction module of template, from webpage source code information, extract structurized data, and store in the database with required fixed form;
Text based on sentiment classification module based on phrase extraction obtains structured message based on the phrase extraction pattern, and respectively the structured message corpus of text is carried out based on sentiment classification, obtains the final tendency degree Sensibility (Text) of corpus of text; The processing of this module comprises:
The emotion tendency weights of vocabulary A and vocabulary B are designated as Sensibility (A) or Sensibility (B);
Judge whether vocabulary A and vocabulary B are present in " degree adverb " and reach in " negative adverb " vocabulary:
If vocabulary A and vocabulary B all do not exist, then the emotion of this phrase tendency weights do
Sensibility(A+B)=Sensiblilty(A)+Sensibility(B);
If vocabulary A is present in " negative adverb " vocabulary, then the phrase centre word is vocabulary B, and the emotion weights that calculate vocabulary B are Sensibility (B), then the emotion weights Sensibility (A+B) of this phrase=(1) * Sensibility (B);
Otherwise if vocabulary B is present in " negative adverb " vocabulary, then this phrase centre word is vocabulary A, the emotion weights Sensibility (A+B) of this phrase=(1) * Sensibility (A);
If vocabulary A is present in " degree adverb " vocabulary, then the phrase centre word is vocabulary B, with the degree multiple of level (A) expression as the vocabulary A of degree adverb, the emotion weights of this phrase
Sensibility(A+B)=level(A)×Sensibility(B);
Otherwise, with the degree multiple of level (B) expression as the vocabulary B of degree adverb, the emotion weights of this phrase
Sensibility(A+B)=level(B)×Sensibility(A);
The phrase weights that calculate all commendations tendency and derogatory term tendency respectively with, represent the phrase weights that have commendation tendency and derogatory term to be inclined to respectively with Positive (words) and Negative (words):
With genitive phrase emotion weights summations, the gained result is less than 0 phrase weights as the derogatory term tendency
With genitive phrase emotion weights summations, the gained result more than or equal to 0 as commendatory term tendency and neutral phrase weights
The final tendency degree of corpus of text is with Sensibility (Text) expression, then
If Sensibility (Text)<0 representes that then the text is a derogatory sense tendency text; If Sensibility (Text)>=0 expression text is commendation tendency or neutral text;
Based on the text based on sentiment classification module of vocabulary statistical model, accomplish the information source and the analysis of negative tendency property of system, obtain text Text emotion tendency value, the concrete processing of this module comprises:
Read in text Text, text Text is carried out subordinate sentence by punctuate, be labeled as S1, S2, Λ Λ Sn;
All have the attitude speech of explicit semantic meaning tendency search S1; The part of speech of the attitude speech of being searched for here is adjective, adverbial word, noun, verb and Chinese idiom etc.; Utilize vocabulary emotion computing module to calculate each attitude speech emotion weights; And the weights of all attitude speech among the S1 are superposeed, obtain all attitude speech weights summation V1 of this subordinate sentence;
All are included in the degree speech quantity in the degree adverb dictionary search S1, when comprising the degree speech, attitude weights V1 multiply by the degree multiple level () of degree adverb in the degree dictionary, i.e. level () * V1;
S1 calculates and finishes, and next subordinate sentence S2 of search Text repeats three steps in front, calculates all attitude speech weights summation V2 of this subordinate sentence S2;
Behind all attitude speech weights summation Vn that calculate last subordinate sentence, calculate positive Vi weights summation Positive (Sentences) respectively, with negative Vi weights summation Negative (Sentences)
Calculating final text tendency degree at last is:
2. public sentiment vertical search analytic system as claimed in claim 1 is characterized in that, said webpage is climbed and got strategy and comprise depth-first search strategy, BFS strategy, best-first search strategy.
3. public sentiment vertical search analytic system as claimed in claim 1 is characterized in that, said webpage is climbed to get and comprised climbing and get URL, climb and get the webpage degree of depth, climb and get webpage number, url filtering four attributes in being identified at.
4. public sentiment vertical search analytic system as claimed in claim 1 is characterized in that, said vertical search engine reptile module, and it is climbed extract operation and adopts multithreading, comprises following processing:
After certain thread is accomplished page download; The page of downloading is committed to parsing buffer zone thread pool with the link network address form that comprises in the webpage that extracts, joins in the buffer queue to be downloaded, thread pool invoke resolver analyzing web page extracts URL; And join the URL that parsing obtains in the URL record: URL is a tree structure in network; In tree structure, the URL of different level nodes maybe be identical, because same URL can be by parsing in other a lot of webpages; In reptile design, comprise the URL that record parses, earlier URL is judged before URL is write down, if during the URL that parses Already in writes down, then skip this URL, otherwise can join URL in the record that is untreated; When reptile is searched for; It at first handles initial URL, after resolver resolves, will obtain new one deck URL formation; Next reptile is downloaded these URL according to the default sequence of URL in formation; And analyze, resolving the URL that makes new advances, the URL after the processing puts into and handles record queue; Only all webpages of current level climb get completion after, just can climb and get the URL of next level.
5. public sentiment vertical search analytic system as claimed in claim 1; It is characterized in that; In the text based on sentiment classification module of said vocabulary statistical model; If search subordinate sentence S1 all be included in the negative word quantity in the negative adverb table, when the negative word number is odd number, with S1 belong to subordinate sentence attitude weights V1 be converted into-(1-t) * V1; Wherein t is a fuzzy value, and is relevant in degree adverb position successively with negative word, and greater than zero, negative word less than zero, is established fuzzy value at back fuzzy value between 0.2~0.4 to negative word at preceding fuzzy value.
6. public sentiment vertical search analytical approach is characterized in that this method may further comprise the steps:
Call vertical search engine reptile module, utilize the webpage of filtering technique and the BFS of reptile algorithm through topological sum web page contents key word Network Based to climb and get, selectively search for and internet web page that download is relevant with the public sentiment theme;
Information extraction module through based on template extracts structurized data from webpage source code information, and stores in the database with required fixed form;
Realize two kinds of algorithms through text based on sentiment classification module: based on the phrase extraction pattern with based on the vocabulary statistical model, and respectively the structured message text is carried out based on sentiment classification, obtain text emotion tendency weights;
Obtain structured message based on the phrase extraction pattern, respectively the structured message corpus of text carried out based on sentiment classification, obtain the final tendency degree Sensibility (Text) of corpus of text:
The emotion tendency weights of vocabulary A and vocabulary B are designated as Sensibility (A) or Sensibility (B);
Judge whether vocabulary A and vocabulary B are present in " degree adverb " and reach in " negative adverb " vocabulary:
If vocabulary A and vocabulary B all do not exist, then the emotion of this phrase tendency weights do
Sensibility(A+B)=Sensiblilty(A)+Sensibility(B);
If vocabulary A is present in " negative adverb " vocabulary, then the phrase centre word is vocabulary B, and the emotion weights that calculate vocabulary B are Sensibility (B), then the emotion weights Sensibility (A+B) of this phrase=(1) * Sensibility (B);
Otherwise if vocabulary B is present in " negative adverb " vocabulary, then this phrase centre word is vocabulary A, the emotion weights Sensibility (A+B) of this phrase=(1) * Sensibility (A);
If vocabulary A is present in " degree adverb " vocabulary, then the phrase centre word is vocabulary B, with the degree multiple of level (A) expression as the vocabulary A of degree adverb, the emotion weights of this phrase
Sensibility(A+B)=level(A)×Sensibility(B);
Otherwise, with the degree multiple of level (B) expression as the vocabulary B of degree adverb, the emotion weights of this phrase
Sensibility(A+B)=level(B)×Sensibility(A);
The phrase weights that calculate all commendations tendency and derogatory term tendency respectively with, represent the phrase weights that have commendation tendency and derogatory term to be inclined to respectively with Positive (words) and Negative (words):
With genitive phrase emotion weights summations, the gained result is less than 0 phrase weights as the derogatory term tendency
With genitive phrase emotion weights summations, the gained result more than or equal to 0 as commendatory term tendency and neutral phrase weights
The final tendency degree of corpus of text is with Sensibility (Text) expression, then
If Sensibility (Text)<0 representes that then the text is a derogatory sense tendency text; If Sensibility (Text)>=0 expression text is commendation tendency or neutral text;
Based on the text based on sentiment classification of vocabulary statistical model, accomplish the information source and the analysis of negative tendency property of system, obtain the emotion tendency value of text Text:
Read in text Text, text Text is carried out subordinate sentence by punctuate, be labeled as S1, S2, Λ Λ Sn;
All have the attitude speech of explicit semantic meaning tendency search S1; The part of speech of the attitude speech of being searched for here is adjective, adverbial word, noun, verb and Chinese idiom etc.; Utilize vocabulary emotion computing module to calculate each attitude speech emotion weights; And the weights of all attitude speech among the S1 are superposeed, obtain all attitude speech weights summation V1 of this subordinate sentence;
All are included in the degree speech quantity in the degree adverb dictionary search S1, when comprising the degree speech, attitude weights V1 multiply by the degree multiple level () of degree adverb in the degree dictionary, i.e. level () * V1;
S1 calculates and finishes, and next subordinate sentence S2 of search Text repeats three steps in front, calculates all attitude speech weights summation V2 of this subordinate sentence S2;
Behind all attitude speech weights summation Vn that calculate last subordinate sentence, calculate positive Vi weights summation Positive (Sentences) respectively, with negative Vi weights summation Negative (Sentences)
Calculating final text tendency degree at last is:
7. public sentiment vertical search analytical approach as claimed in claim 6 is characterized in that, said webpage is climbed and got strategy and comprise depth-first search strategy, BFS strategy, best-first search strategy.
8. public sentiment vertical search analytical approach as claimed in claim 6 is characterized in that, said webpage is climbed to get and comprised climbing and get URL, climb and get the webpage degree of depth, climb and get webpage number, url filtering four attributes in being identified at.
9. public sentiment vertical search analytical approach as claimed in claim 6 is characterized in that, said vertical search engine reptile module, and it is climbed extract operation and adopts multithreading, comprises following processing:
After certain thread is accomplished page download; The page of downloading is committed to parsing buffer zone thread pool with the link network address form that comprises in the webpage that extracts, joins in the buffer queue to be downloaded, thread pool invoke resolver analyzing web page extracts URL; And join the URL that parsing obtains in the URL record: URL is a tree structure in network; In tree structure, the URL of different level nodes maybe be identical, because same URL can be by parsing in other a lot of webpages; In reptile design, comprise the URL that record parses, earlier URL is judged before URL is write down, if during the URL that parses Already in writes down, then skip this URL, otherwise can join URL in the record that is untreated; When reptile is searched for; It at first handles initial URL, after resolver resolves, will obtain new one deck URL formation; Next reptile is downloaded these URL according to the default sequence of URL in formation; And analyze, resolving the URL that makes new advances, the URL after the processing puts into and handles record queue; Only all webpages of current level climb get completion after, just can climb and get the URL of next level.
10. public sentiment vertical search analytical approach as claimed in claim 6 is characterized in that, and is said if search certain subordinate sentence S
nAll are included in the negative word quantity in the negative adverb table, when the negative word number is odd number, S1 are belonged to subordinate sentence attitude weights V
nBe converted into-(1-t) * V
nWherein t is a fuzzy value, and is relevant in degree adverb position successively with negative word, and greater than zero, negative word less than zero, is established fuzzy value at back fuzzy value between 0.2~0.4 to negative word at preceding fuzzy value.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN2011103549731A CN102609427A (en) | 2011-11-10 | 2011-11-10 | Public opinion vertical search analysis system and method |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN2011103549731A CN102609427A (en) | 2011-11-10 | 2011-11-10 | Public opinion vertical search analysis system and method |
Publications (1)
Publication Number | Publication Date |
---|---|
CN102609427A true CN102609427A (en) | 2012-07-25 |
Family
ID=46526808
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN2011103549731A Pending CN102609427A (en) | 2011-11-10 | 2011-11-10 | Public opinion vertical search analysis system and method |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN102609427A (en) |
Cited By (19)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102968452A (en) * | 2012-10-25 | 2013-03-13 | 北京腾逸科技发展有限公司 | Network public opinion information statistical method and system |
CN103309948A (en) * | 2013-05-20 | 2013-09-18 | 携程计算机技术(上海)有限公司 | System and method for public opinion monitoring analysis and intelligent distribution processing of coordination center |
CN104268283A (en) * | 2014-10-21 | 2015-01-07 | 浪潮集团有限公司 | Method for automatically analyzing Internet web page |
CN104346328A (en) * | 2013-07-23 | 2015-02-11 | 同程网络科技股份有限公司 | Vertical intelligent crawler data collecting method based on webpage data capture |
CN104504081A (en) * | 2014-12-25 | 2015-04-08 | 北京东方剪报国际信息咨询有限公司 | Intelligent analysis system for all-media detection and monitoring big data behaviors |
CN105022805A (en) * | 2015-07-02 | 2015-11-04 | 四川大学 | Emotional analysis method based on SO-PMI (Semantic Orientation-Pointwise Mutual Information) commodity evaluation information |
CN105528370A (en) * | 2014-09-30 | 2016-04-27 | 北京奇虎科技有限公司 | Page detection method and client |
CN105574092A (en) * | 2015-12-10 | 2016-05-11 | 百度在线网络技术(北京)有限公司 | Information mining method and device |
CN103309948B (en) * | 2013-05-20 | 2016-11-30 | 上海携程商务有限公司 | Liaison centre's public sentiment monitoring analysis and smart allocation processing system and method |
CN106503213A (en) * | 2016-10-27 | 2017-03-15 | 星云纵横(北京)大数据信息技术有限公司 | A kind of network data information shows management method and system |
CN107767195A (en) * | 2016-08-16 | 2018-03-06 | 阿里巴巴集团控股有限公司 | The display systems and displaying of description information, generation method and electronic equipment |
CN108153817A (en) * | 2017-11-29 | 2018-06-12 | 成都东方盛行电子有限责任公司 | A kind of intelligent web page collecting method |
CN108241682A (en) * | 2016-12-26 | 2018-07-03 | 北京国双科技有限公司 | Determine the method and device of text emotion |
CN108319587A (en) * | 2018-02-05 | 2018-07-24 | 中译语通科技股份有限公司 | A kind of public sentiment value calculation method and system of more weights, computer |
CN109101636A (en) * | 2018-08-16 | 2018-12-28 | 成都市映潮科技股份有限公司 | A kind of method, apparatus and system carrying out data acquisition in cloud by visual configuration |
CN109388642A (en) * | 2018-10-23 | 2019-02-26 | 北京计算机技术及应用研究所 | Sensitive data based on label tracks source tracing method |
CN109783815A (en) * | 2018-12-28 | 2019-05-21 | 华南理工大学 | A kind of various dimensions network public-opinion big data comparative analysis method |
CN114491207A (en) * | 2022-01-18 | 2022-05-13 | 平安普惠企业管理有限公司 | Public opinion analysis method and related product |
CN116069899A (en) * | 2022-09-08 | 2023-05-05 | 重庆思达普规划设计咨询服务有限公司 | Text analysis method and system |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20070294230A1 (en) * | 2006-05-31 | 2007-12-20 | Joshua Sinel | Dynamic content analysis of collected online discussions |
CN101751458A (en) * | 2009-12-31 | 2010-06-23 | 暨南大学 | Network public sentiment monitoring system and method |
CN101894102A (en) * | 2010-07-16 | 2010-11-24 | 浙江工商大学 | Method and device for analyzing emotion tendentiousness of subjective text |
-
2011
- 2011-11-10 CN CN2011103549731A patent/CN102609427A/en active Pending
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20070294230A1 (en) * | 2006-05-31 | 2007-12-20 | Joshua Sinel | Dynamic content analysis of collected online discussions |
CN101751458A (en) * | 2009-12-31 | 2010-06-23 | 暨南大学 | Network public sentiment monitoring system and method |
CN101894102A (en) * | 2010-07-16 | 2010-11-24 | 浙江工商大学 | Method and device for analyzing emotion tendentiousness of subjective text |
Non-Patent Citations (3)
Title |
---|
YANG SHEN 等: "Emotion Mining Research on Micro-blog", 《2009. SWS" 09. 1ST IEEE SYMPOSIUM ON WEB SOCIETY》 * |
张旭 等: "BBS舆情系统爬虫模块的研究", 《铁路计算机应用》 * |
李钝 等: "基于短语模式的文本情感分类研究", 《计算机科学》 * |
Cited By (27)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102968452A (en) * | 2012-10-25 | 2013-03-13 | 北京腾逸科技发展有限公司 | Network public opinion information statistical method and system |
CN103309948A (en) * | 2013-05-20 | 2013-09-18 | 携程计算机技术(上海)有限公司 | System and method for public opinion monitoring analysis and intelligent distribution processing of coordination center |
CN103309948B (en) * | 2013-05-20 | 2016-11-30 | 上海携程商务有限公司 | Liaison centre's public sentiment monitoring analysis and smart allocation processing system and method |
CN104346328A (en) * | 2013-07-23 | 2015-02-11 | 同程网络科技股份有限公司 | Vertical intelligent crawler data collecting method based on webpage data capture |
CN105528370A (en) * | 2014-09-30 | 2016-04-27 | 北京奇虎科技有限公司 | Page detection method and client |
CN105528370B (en) * | 2014-09-30 | 2020-04-07 | 奇安信科技集团股份有限公司 | Page detection method and client |
CN104268283A (en) * | 2014-10-21 | 2015-01-07 | 浪潮集团有限公司 | Method for automatically analyzing Internet web page |
CN104504081A (en) * | 2014-12-25 | 2015-04-08 | 北京东方剪报国际信息咨询有限公司 | Intelligent analysis system for all-media detection and monitoring big data behaviors |
CN105022805B (en) * | 2015-07-02 | 2018-05-04 | 四川大学 | A kind of sentiment analysis method based on SO-PMI information on commodity comment |
CN105022805A (en) * | 2015-07-02 | 2015-11-04 | 四川大学 | Emotional analysis method based on SO-PMI (Semantic Orientation-Pointwise Mutual Information) commodity evaluation information |
CN105574092B (en) * | 2015-12-10 | 2019-08-23 | 百度在线网络技术(北京)有限公司 | Information mining method and device |
CN105574092A (en) * | 2015-12-10 | 2016-05-11 | 百度在线网络技术(北京)有限公司 | Information mining method and device |
CN107767195A (en) * | 2016-08-16 | 2018-03-06 | 阿里巴巴集团控股有限公司 | The display systems and displaying of description information, generation method and electronic equipment |
CN106503213A (en) * | 2016-10-27 | 2017-03-15 | 星云纵横(北京)大数据信息技术有限公司 | A kind of network data information shows management method and system |
CN108241682A (en) * | 2016-12-26 | 2018-07-03 | 北京国双科技有限公司 | Determine the method and device of text emotion |
CN108241682B (en) * | 2016-12-26 | 2021-03-30 | 北京国双科技有限公司 | Method and device for determining text emotion |
CN108153817A (en) * | 2017-11-29 | 2018-06-12 | 成都东方盛行电子有限责任公司 | A kind of intelligent web page collecting method |
CN108153817B (en) * | 2017-11-29 | 2021-08-10 | 成都东方盛行电子有限责任公司 | Intelligent web page data acquisition method |
CN108319587B (en) * | 2018-02-05 | 2021-11-19 | 中译语通科技股份有限公司 | Multi-weight public opinion value calculation method and system and computer |
CN108319587A (en) * | 2018-02-05 | 2018-07-24 | 中译语通科技股份有限公司 | A kind of public sentiment value calculation method and system of more weights, computer |
CN109101636A (en) * | 2018-08-16 | 2018-12-28 | 成都市映潮科技股份有限公司 | A kind of method, apparatus and system carrying out data acquisition in cloud by visual configuration |
CN109388642A (en) * | 2018-10-23 | 2019-02-26 | 北京计算机技术及应用研究所 | Sensitive data based on label tracks source tracing method |
CN109388642B (en) * | 2018-10-23 | 2021-08-27 | 北京计算机技术及应用研究所 | Sensitive data tracing and tracing method based on label |
CN109783815A (en) * | 2018-12-28 | 2019-05-21 | 华南理工大学 | A kind of various dimensions network public-opinion big data comparative analysis method |
CN114491207A (en) * | 2022-01-18 | 2022-05-13 | 平安普惠企业管理有限公司 | Public opinion analysis method and related product |
CN116069899A (en) * | 2022-09-08 | 2023-05-05 | 重庆思达普规划设计咨询服务有限公司 | Text analysis method and system |
CN116069899B (en) * | 2022-09-08 | 2023-06-30 | 重庆思达普规划设计咨询服务有限公司 | Text analysis method and system |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN102609427A (en) | Public opinion vertical search analysis system and method | |
CN106682192B (en) | Method and device for training answer intention classification model based on search keywords | |
US10515125B1 (en) | Structured text segment indexing techniques | |
Chen et al. | Websrc: A dataset for web-based structural reading comprehension | |
CN103049435B (en) | Text fine granularity sentiment analysis method and device | |
US8751218B2 (en) | Indexing content at semantic level | |
Hazman et al. | A survey of ontology learning approaches | |
Rizzo et al. | NERD meets NIF: Lifting NLP Extraction Results to the Linked Data Cloud. | |
Furlan et al. | Semantic similarity of short texts in languages with a deficient natural language processing support | |
CN107590219A (en) | Webpage personage subject correlation message extracting method | |
CN109472022B (en) | New word recognition method based on machine learning and terminal equipment | |
CN103514213A (en) | Term extraction method and device | |
Lytvyn et al. | Analysis of statistical methods for stable combinations determination of keywords identification | |
Hazman et al. | Ontology learning from domain specific web documents | |
Chinsha et al. | Aspect based opinion mining from restaurant reviews | |
CN112989208A (en) | Information recommendation method and device, electronic equipment and storage medium | |
US20220365956A1 (en) | Method and apparatus for generating patent summary information, and electronic device and medium | |
Schatten et al. | An introduction to social semantic web mining & big data analytics for political attitudes and mentalities research | |
Jiang et al. | Word network topic model based on Word2Vector | |
Munot et al. | Conceptual framework for abstractive text summarization | |
CN104281695A (en) | Combination theory based quasi natural language semantic information extraction method and system | |
Wimmer et al. | Word sense disambiguation for ontology learning | |
Abuteir et al. | Automatic sarcasm detection in Arabic text: A supervised classification approach | |
Priyatam et al. | Don't Use a Lot When Little Will Do: Genre Identification Using URLs. | |
JP7227705B2 (en) | Natural language processing device, search device, natural language processing method, search method and program |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
WD01 | Invention patent application deemed withdrawn after publication | ||
WD01 | Invention patent application deemed withdrawn after publication |
Application publication date: 20120725 |