CN104636425A - Method for predicting and visualizing emotion cognitive ability of network individual or group - Google Patents

Method for predicting and visualizing emotion cognitive ability of network individual or group Download PDF

Info

Publication number
CN104636425A
CN104636425A CN201410795679.8A CN201410795679A CN104636425A CN 104636425 A CN104636425 A CN 104636425A CN 201410795679 A CN201410795679 A CN 201410795679A CN 104636425 A CN104636425 A CN 104636425A
Authority
CN
China
Prior art keywords
emotion
network
word
individual
emotion recognition
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201410795679.8A
Other languages
Chinese (zh)
Other versions
CN104636425B (en
Inventor
周建栋
赵燕平
张华平
李想
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing University of Chemical Technology
Beijing Institute of Technology BIT
Original Assignee
Beijing University of Chemical Technology
Beijing Institute of Technology BIT
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing University of Chemical Technology, Beijing Institute of Technology BIT filed Critical Beijing University of Chemical Technology
Priority to CN201410795679.8A priority Critical patent/CN104636425B/en
Publication of CN104636425A publication Critical patent/CN104636425A/en
Application granted granted Critical
Publication of CN104636425B publication Critical patent/CN104636425B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/36Creation of semantic tools, e.g. ontology or thesauri
    • G06F16/367Ontology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Systems or methods specially adapted for specific business sectors, e.g. utilities or tourism
    • G06Q50/01Social networking

Abstract

The invention relates to a method for predicting and visualizing the emotion cognitive ability of a network individual or group and belongs to the field of Internet emotion information mining and analyzing. According to the method for predicting and visualizing the emotion cognitive ability of the network individual or group, common emotion words which are recorded in an existing emotion dictionary are integrated, new network emotion words and expression characters which have the emotional tendency and exist in the network environment are considered, emotional elements on a social media platform are included to the maximum extent, and an emotion word body bank is constructed on the basis; the position of an emotion bifurcation of the network individual is determined, the level of the emotion cognitive ability of the individual is described through an emotion cognitive ability index, and the differences between the emotion bifurcations of multiple network individuals are displayed in a visualized mode. By the adoption of the method for predicting and visualizing the emotion cognitive ability of the network individual or group, the evolution law of the emotion cognitive ability of the network individual or group can be disclosed, particularly, the dynamic emotion changing process and the critical point of sudden emotion changing of typical network individuals or groups can be predicted, relevant managers are assisted in reasonably guiding network consensus, and a harmonious network environment is created.

Description

A kind of network individuality or colony's Emotion recognition ability are predicted and method for visualizing
Technical field
The present invention relates to the individual or colony's Emotion recognition ability prediction of a kind of network and method for visualizing, belong to internet public feelings information and excavate and analysis field.
Background technology
Along with the fast development of social networks technology and application, people more and more get used to emotion, attitude, suggestion and the viewpoint of being shared oneself by internet, wherein emotion is the decision strength of leading rear three, because emotion is psychoreaction and the impression of people inherence, such as pleasure, anger, sorrow, happiness etc., it can affect the decision behavior of people significantly.Meanwhile, the network individuality having certain suggestion guidance capability and topic responsiveness is impacted by the viewpoint attitude of the network platform to its follower especially.Such as, in the film marketing activity, well-known artist often through the network platform (comprising microblogging, blog etc.) to the emotion viewpoint of its bean vermicelli even opinion expression make a big impact.Equally, microblogging " large V ", celebrity blog etc. often have the bean vermicelli follower of millions quantity as a kind of representative network individuality, in network event, have the direction that higher language affects for the time being public sentiment and decision behavior on higher degree, play the part of the role of " leader of opinion ".Therefore, the development tool of the prediction visualization tool of or colony mood dynamic changing process and mood bifurcation thereof individual to network is of great significance.Say from business perspective, by the effective promotion scheme of law formulation of the individual mood dynamic changing process of monitoring network, and improve product in time according to the situation of network colony mood dynamic change after production marketing and carry out public praise maintenance.From governance angle, by analyzing, network is individual or colony, especially representative network are individual or the bifurcation prediction of the mood dynamic changing process of colony and paralepsy thereof, supvr is helped effectively to manage the network user, and correct guidance is carried out to network public opinion, build harmonious internet environment.
Li Y etc. propose complexity theory and the dynamic change modeling method of mood structure in organizational environment (Li Y, Ashkanasy N M, Ahlstrom D.Complexity theory and affect structure:A dynamic approach to modeling emotional changes in organizations [J] .Research on emotion in organizations, 2010, 6:139-165.), the method is based on business organization's inside emotional event case, propose mood structure bifurcation model qualitatively, disclose essence and the rule of the dynamic change of enterprise staff mood, the Emotion recognition limit of power met as shown in Figure 1 divides (0, 1], (1, 3], (3, 3.57], (3.57, 4), " the disappearance state " of the corresponding individual mood of difference, " balanced state ", " approximate equalization state ", " disorderly state ".Described bifurcation theory is the mathematical modeling (May of the description complex dynamic systems phenomenon that May in 1976 proposes based on Chaos theory, R.M.Simple mathematical models with very complicated dynamics.Nature, 1976,261 (5560): 459-467.), as shown in Figure 2, describe the bifurcated mutability feature of general nature system state, the hopping phenomenon of the complication system state namely under actual environment condition.Weiss H M etc. propose emotional event theory (Weiss H M, Cropanzano R.Affective events theory:A theoretical discussion of the structure, causes and consequences of affective experiences at work [J] .Research in organizational behavior:An annual series of analytical essays and critical reviews, 1996, 18:1-74.), systematic Study has been carried out to the causal structure that Employees'Emotions in working environment is experienced, this theory discloses emotional event in business organization's internal environment and people to its cognitive appraisal, relation between emotional responses and attitude behavior.Li Y etc. and Weiss H M etc. are to the analysis of mood structure dynamics variation model just qualitatively, enough experiments and enquiry data do not do basis, the emotional change process of people and Emotion recognition ability thereof are not analyzed from network text angle by natural language, and just the dynamic change mechanism of mankind's microcosmic affective state and the cause-effect relationship between external emotional event and emotional state in conjunction with Chaos bifurcation theoretical analysis qualitatively from Emotion recognition Psychological Angle.Above-mentioned research and correlative study afterwards all do not propose how or colony individual to the network of internet social media and carry out sentiment analysis and prediction Visualization Model, the such as method calculating mood structure bifurcation and Emotion recognition Capability index according to network text in this paper.Emotional event theory and mood structure bifurcation model are introduced the analysis of public opinion and network data excavation field by difference of the present invention, applicating text sentiment analysis technology, provides the bifurcation that a kind of Emotion recognition ability of or colony individual to social network media environment lower network and emotion get muddled before state and predicts and visualization method.
Summary of the invention
The object of the present invention is to provide a kind of effective, intuitively the individual mood bifurcation of network and Emotion recognition Capability index level are predicted and visualization method, help the sudden change bifurcation of user's understanding and monitoring network individuality or colony's mood dynamic change and then predict its mood Evolution States (" disappearance state ", " balanced state ", " approximate equalization state ", " disorderly state ") and development trend, can be used for the performance analysis of internet mass event public sentiment and early warning and the numerous association areas relevant with network sentiment Evolution States.
Thought of the present invention is that the network text issued by or tissue individual to network under social network environment carries out Collection and analysis, a kind of network individual mood bifurcation computing method are proposed, establish network mood structure bifurcation model, its paralepsy bifurcation position and emotion cognition ability level thereof are described, predict and visual.
The object of the invention is to be achieved through the following technical solutions:
A kind of network individuality or colony's Emotion recognition ability are predicted and method for visualizing, comprise step:
Step 1) build emotion word ontology library
In order to the position of the individual mood bifurcation of computational grid, need the more comprehensive emotion word ontology library of structure one, concrete steps comprise: 1-1) existing Chinese sentiment dictionary is integrated, to comprise conventional emotion word more all sidedly.1-2) from the basis of large-scale corpus collection, train the network neologisms that netizen frequently uses, and the word wherein without obvious emotional color is rejected.1-3) from the basis of large-scale corpus collection, train netizen and use character of expressing one's feelings more frequently.1-4) conventional emotion word, network sentiment neologisms and emoticon word together constitute the Emotion element collection of network text sentiment analysis.
Build emotion word ontology library E based on above-mentioned Emotion element collection, comprise emotion word itself, polarity tendency, emotion intensity level, E can be expressed as:
E=<(W 1,P 1,I 1),(W 2,P 2,I 2),...,,(W i,P i,I i),...,(W n,P n,I n)>
Wherein, W irepresent emotion word, P irepresent W ipolarity (P i> 0 represents that it is positive emotion word; P i< 0 represents that it is negative affect word), I irepresent W iemotion intensity level, the larger expression of its absolute value has higher emotion intensity level, and 1≤i≤n, n is the number of emotion word in E.
A. polarity integration method.The polarity P of conventional emotion word iconsistent with the polarity in sentiment dictionary, as there is the inconsistent situation of the mark of same emotion word in different emotions dictionary, use many people to vote mode correction; Network sentiment neologisms and emoticon are due to limited amount, and its polarity all adopts many people's modes of voting to determine.
B. emotion strength determining method.First extensive social networks text set U is obtained, calculate the distribution of each word in this set in emotion word, then according to the emotion weight of the distribution calculated candidate emotion word of emotion word, what exceed threshold value is emotion word, finally calculate conventional emotion word emotion intensity level, as mentioned below.Network sentiment neologisms and emoticon are due to limited amount, and its emotion intensity is all adopting many people's modes of voting to determine with reference on conventional emotion word intensity basis.
Social networks text set is represented below, S with U justand S negativebe illustrated respectively in the positive emotion word in U and negative affect set of words, unification S *represent emotion word set.Suppose S *in set, an emotion word w can be expressed as character string C 1c 2c ic k, wherein C irepresent a word in this emotion word; Word in emotion word have positive emotion word and negative affect word point, and the positive-negative polarity of each emotion word is consistent with place emotion word polarity.
Calculate the distribution of " word " in described text set, with P (C i| S *) represent from emotion word set S *middle word C iprobability in network text set U, its computing formula is as follows:
P ( C i | S * ) = P ( S * , C i ) P ( S * ) = Freq ( S * , C i ) + &delta; Freq ( S * ) + 1 - - - ( 1 )
Wherein, P (S *, C i) represent belong to emotion word set S *word in composition word C ithe probability occurred in U; P (S *) represent emotion word set S *the probability that all composition words of middle word occur in text set U; Freq (S *, C i) represent belong to emotion word set S *the composition word C of word ithe frequency occurred in U, Freq (S *) represent belong to emotion word set S *the frequency sum that occurs in U of all composition words; In addition, δ gets less numerical value, gets emotion word set S here *the inverse of total number of word.
Calculate candidate's emotion word (abbreviation candidate word) w in text set U *distribution
P ( w * ) = Freq ( w * ) + &delta; &Sigma; i = 1 | U | Freq ( w i ) + 1 - - - ( 2 )
An i.e. candidate word w *appear at the probability in U, wherein, | U| represents the number of word in text set; w irepresent any one word in text set U; Freq (w *) represent candidate word w *the frequency occurred in U, represent all w ithe frequency sum occurred in U; δ implication is the same.
Calculated candidate word w *emotion weight, each candidate word comes from U, unknown its whether be emotion word, calculate its emotion weight, to judge whether it is emotion word, and its polar intensity and optimum ownership emotion word set.Candidate word w *emotion weight calculation formula as follows:
r ( w * | S * ) = &alpha; &Sigma; i = 1 k log P ( C i | S * ) + &beta; log P ( w * )
Wherein, α, β ∈ [0,1] is combination adjustment parameter, C iw *i-th word, w *in total k word, P (C i| S *) and P (w *) then calculate by formula (1) and (2), it show also the emotion tendency of this word.
The emotion degree of membership of calculated candidate word and emotion intensity, from above-mentioned statement, each candidate word has positive emotion weight r (w *| S just) and negative affect weight r (w *| S negative), its emotion degree of membership I can be expressed as the comprehensive of two kinds of emotion tendencies like this, is specifically expressed as follows:
I (w *)=r (w *| S negative)-r (w *| S just)
Wherein, I represents the emotion degree of membership of candidate word, and the size of numerical value states its emotion subjection degree, is also its emotion intensity level had.According to the symbol of I, P can be labeled as+1 or-1, and the size of its absolute value determines w *whether be emotion word, and its symbol can determine to belong to S justor S negative.
By above-mentioned calculating, can obtain the emotion degree of membership of all candidate word, then sort according to the size of its value, emotion is subordinate to the larger candidate word of angle value, and its emotion tendency degree is also higher, has higher emotion intensity.Like this, choose emotion degree of membership candidate word within the specific limits can determine as emotion neologisms (threshold epsilon), using the absolute value of its corresponding emotion degree of membership as emotion intensity level, polarity sign as the mark of forward or negative sense emotion, and is inserted ontology library.
C. the calculating of emoticon emotion intensity, the META value can taking out this symbol carries out similarly determining.
The structure in emotional noumenon storehouse can be completed by above-mentioned series of steps.The structure of described ontology library obtains according to the study of large scale network text set, therefore has the rationality of large data statistics.
Step 2) determine the individual mood bifurcation position of network, calculate its Emotion recognition Capability index value
The temporally text message collection delivered of sequence acquisition network individuality, the emotion bifurcation position of computational grid individuality and the process changed chronologically thereof, comprising:
Step 2-1) temporally sequence set the text message set delivered of network individuality that collects can be expressed as U, U similarly:
Wherein T is time series, and S is the micro-blog information vector set corresponding with T, and the micro-blog information delivered as t is S t.
Step 2-2) Text Pretreatment work is carried out to described text message collection.Use ICTCLAS participle instrument to described information aggregate U temporally sequence order T carry out Chinese word segmentation and part-of-speech tagging etc., obtain the micro-blog information S that t is delivered tin word finder W t:
W t=<w 1,w 2,...w j,...w J>
Wherein w ifor W tin a word, whether but be that emotion word also needs further judgement, J is word number.
Step 2-3) extract emotion word according to emotion word ontology library E.Detailed process is: by t micro-blog information S tin word finder W twith described step 1) in the emotion word ontology library E that builds match, extract W tin all Emotion elements and match its emotion intensity level and polarity by emotional noumenon storehouse.
Step 2-4) build the individual emotion bifurcation position calculation model of network and the individual network Emotion recognition Capability index value changed chronologically of computational grid, comprise the steps:
Step 2-4-1) calculate " disorderly state ", " balanced state ", the ratio shared by Emotion element of " approximate equalization state " and " disappearance state " four emotional states.
Use Num trepresent from W tthe Emotion element number extracted, dead t, low t, med tand high trepresent Num respectively tin individual Emotion element intensity level size respectively (0, a), [a, b), [b, c), [c, d) number in, wherein d=max (| I (w) |) is the maximal value of all emotion intensity absolute values in emotional noumenon storehouse, wherein separation a, b, c, d are the parameter value of corresponding mood bifurcation, and its numerical value is a=0.25d; B=0.75d; C=0.8925d.These four intervals are consistent with described emotional state and mood structure bifurcation model (shown in Figure of description 1), respectively corresponding " disappearance state [0; 1] ", " balanced state (1; 3] ", " approximate equalization state (3,3.57] " and " disorderly state (3.57,4) ".And can calculate respectively
P ( dead ) t = dead t Num t ; P ( low ) t = low t Num t ; P ( med ) t = med t Num t ; P ( high ) t = high t Num t
Wherein, P (high), P (med), P (low), P (dead) represent " the disorderly state ", " balanced state " that in t micro-blog information St, high, medium and low emotion intensity level is corresponding respectively, " approximate equalization state " and " disappearance state " ratio shared by Emotion element.
Step 2-4-2) according to mood structure bifurcation position, definition is computational grid individual Emotion recognition Capability index R also t.The Emotion recognition ability of network individuality is higher, and its emotional state is more easily in " disorderly state "; Emotion recognition ability is lower, and its emotional state is easily in " equilibrium state "; Emotion recognition ability is moderate, and its emotional state is easily in " approximate equalization state ", asks
max{P(high) t,P(med) t,P(low) t,P(dead) t}
Represent t micro-blog information S tin the most significant emotional state, if such as P (high) rmaximum, then represent the individual emotional state being in " disorderly state " in t of network, corresponding Emotion recognition Capability index (3.57,4] in.
The Emotion recognition Capability index size of define grid individuality is R t, be called Emotion recognition ability, the emotional intensity that its computing formula shows in corresponding Emotion recognition ability level with it is relevant, as follows.
If P (high) tget maximum, then R t=3.57+0.43*P (high) t;
If P (med) t, get maximum, then R t=3+0.57*P (med) t;
If P (low) t, get maximum, then R t=1+2*P (low) t;
If P (dead) t, get maximum, then R t=P (dead) t;
Like this, network is individual at time series T=<1, and 2,3 ..., t ..., in T>, the change sequence of its Emotion recognition Capability index (delimiting according to mood bifurcated point value) is R=<R 1, R 2..., R t..., R t>, then every bar microblogging S tthe sequential element value R of its correspondence can be calculated t, this sequence has showed the emotion cognition ability level that network individuality shows different emotions event.
Step 3) build the individual mood bifurcation position visual layout of network.
Take time as transverse axis, with mood bifurcation position for the longitudinal axis, visual presentation carried out to the individual mood bifurcation position of network and Emotion recognition Capability index sequence, can comprise further:
Step 3-1) with described step 2-4) in the network that obtains individual at time series T=<1,2,3 ..., t ..., the change sequence R=<R of mood cognitive ability index in T> 1, R 2..., R t..., R t>, in two-dimensional direct angle coordinate system with time series T for transverse axis, be that the longitudinal axis builds geometric figure with R.Owing to the scope of Emotion recognition Capability index being divided into (0 in mood structure bifurcation dynamic model, 1], (1,3], (3,3.57], (3.57,4) " disappearance state ", " balanced state ", " approximate equalization state ", " the disorderly state " of corresponding individual mood, is distinguished.Be divided into four regions of the longitudinal axis.
Step 3-2) visual presentation is carried out to the individual Emotion recognition Capability index of network.Carry out label for labelling to the point in coordinate system when drawing geometric layout figure, the attribute of this sampling point can be expressed as <t, R t, F>, wherein t represents the t of time series T, R trepresent the Emotion recognition Capability index of moment t, the label symbol selected when F represents described point.
Step 4) multiple network individuality is analyzed.Many individual Emotion recognition Capability index change in location can see its Emotion recognition ability level difference, and visual presentation, comprise further:
Step 4-1) calculate its Emotion recognition Capability index level according to the Emotion recognition Capability index sequence of network individuality, and then determine the Emotion recognition Capability index video sequence of network colony.According to described step 2) determine the time series Emotion recognition Capability index sequence <T of network individuality (suppose H network individual) respectively λ, R λ>, λ are the numberings of these network individualities.To the individual λ of network at time period T λmiddle Emotion recognition Capability index R λbe in respectively (0,1), (1,3], (3,3.57], the number C in (3.57,4) interval 1, C 2, C 3, C 4add up, to frequency C kall R values in Zone R in (k=1,2,3,4) corresponding to maximal value are averaged or median, are defined as interval central value
And define the Emotion recognition Capability index level that this central value is network individuality λ, representing the individual long-term interior rule of emotional event being carried out to cognition of network, is also its significant Emotion recognition ability.To the colony of multiple network individuality composition, the Emotion recognition Capability index video sequence of network colony can be obtained:
R center = < R center 1 , R center 2 , . . . , R center H >
Step 4-2) the numbering λ of network individuality is represented with transverse axis, the longitudinal axis represents the Emotion recognition Capability index level of this individuality build Multi net voting individual Emotion recognition Capability index level and compare visual layout.Other geometric structures and above-mentioned steps 3-1) identical, the drafting comprising Emotion recognition Capability index horizontal level point corresponding to the individual numbering λ of network with the avatar icon of the symbol of special setting or this network individuality for label symbol, to show the difference in network individuality Emotion recognition ability more intuitively.In addition, division methods and step 3-2 to the longitudinal axis) in identical, the longitudinal axis is divided into four regions according to mood bifurcation.This completes the visual layout of multiple network individual Emotion recognition Capability index level difference.
Beneficial effect
The present invention integrates the conventional emotion word of including in existing sentiment dictionary, consider in network environment the network sentiment neologisms and expression character with Sentiment orientation simultaneously, contain the Emotion element on all social media platforms to greatest extent, and construct emotion word ontology library on this basis; Determine the individual mood bifurcation position of network, and in the mode of visual geometry layout its position and change thereof described and predict; Describe the emotion cognition ability level of network individuality with emotion cognition Capability index, and in visual mode, the mood bifurcation difference between multiple network individuality is shown.
Can the emotion bifurcation position size of or colony individual from visual description network and change procedure thereof by the present invention, and the Evolution of its Emotion recognition ability level is disclosed by its anxious state of mind, can help associated user more comprehensively, intuitively awareness network individual or colony the essence of cognitive attitude and affective state is produced to sensitive event, thus can to predict and early warning the cognitive attitude in its future and issuable affective state.The present invention can be applicable to network public-opinion monitoring, microblog emotional analysis, customer evaluation, company and product image, stock market and the various application of financial crisis outburst, venture analysis etc.
Accompanying drawing explanation
Referring to accompanying drawing, embodiments of the present invention is further illustrated, wherein:
Fig. 1 is mood structure bifurcation model schematic;
Fig. 2 is the theoretical schematic diagram of bifurcation;
Fig. 3 is the inventive method schematic flow sheet;
Fig. 4 is that test words quantity is progressively increased emotion word ontology library developing algorithm accuracy rate change schematic diagram in situation;
Fig. 5 is the mood bifurcation change in location schematic diagram of network individuality " Cui Yongyuan ";
Fig. 6 is the mood bifurcation change in location schematic diagram of network individuality " Liu Yifei ";
Fig. 7 is multiple network individual Emotion recognition Capability index horizontal viewable comparison diagram.
Embodiment
In order to make object of the present invention, technical scheme and advantage are clearly understood, below in conjunction with accompanying drawing, by specific embodiment, the present invention is described in more detail.Should be appreciated that specific embodiment described herein only in order to explain the present invention, be not intended to limit the present invention.
In an embodiment of the present invention, provide prediction and the method for visualizing of the individual mood bifurcation position of a kind of network, can the individual mood bifurcation change in location process of directviewing description network predict its Emotion recognition Capability index variation tendency etc.Described network individuality refers to network user internet social platform constantly being expressed mood viewpoint to various topic event, as microblogging " large V " etc.; Described mood bifurcation can its emotion Evolution States when being in " disappearance state ", " balanced state ", " approximate equalization state " or " disorderly state " to the corresponding emotional state of certain event of Dynamic profiling.
Be illustrated in figure 3 the inventive method schematic flow sheet, the method mainly comprises the following steps:
Step 1) build the ontology library can integrating multi-source emotion word;
Step 2) for the information aggregate of institute's collection network individuality, determine mood bifurcation position;
Step 3) set up the visualized graphs of its dynamic change and carry out painted and mark label;
Step 4) individual to multiple network, calculate its mood bifurcation position respectively and change difference and carry out visual presentation;
Step 5) the individual mood bifurcation change in location process of more multiple network and Emotion recognition Capability index, and predict cognitive process and the emotional state of its following emotional event.
More specifically, step 1) emotion word ontology library structure.Chinese emotion vocabulary ontology library, the sentiment dictionary of Taiwan Univ. and the expression lexicon collection of Tsing-Hua University that Dalian University of Technology's Research into information retrieval room manually marks is have employed in this example, summarize more than 20,000 conventional emotion entry and basic facial expression symbol, and construct emotion word ontology library according to method of the present invention, from corpus, automatic mining goes out potential emotion word and new emotional symbol and has marked unified polarity and emotion intensity level, can dynamically update ontology library.
In this step, for verifying the validity of described algorithm, the corpus data respectively for three groups of different internet arenas make a decision experiment, verify the accuracy of this algorithm.Test the work of corpus data without any artificial Emotion tagging for three groups, its essential information is as follows:
A) comment data of store, Jingdone district THINKPAD, size is 16M, comprises 4000 front review information and 4000 negative reviews information altogether.Textual form stores.
B) comment data of bean cotyledon net 700 TV play, size is 65M, and wherein every portion TV play all comprises certain review information, and stores in a text form.
C) the catering trade comment data of popular comment net, size is 407M, and content comprises user ID, hotel owner ID, the comment information such as content and time.
Only perform an analysis to comment content in experimentation, excavate potential unknown emotion word from comment data, so the positive negative report of language material is on this experiment not impact, remainder does not also deal with.
In addition, the conventional sentiment dictionary that this step is selected comprises 25651 emotion word, and all emotion word have feeling polarities mark, wherein positive emotion word 12745, negative affect word 12907.In addition, in order to check the validity of algorithm described in this step, extract from this dictionary positive and negative emotion word each 450 as test word set carry out emotion intensity level mark to it.Meanwhile, considering that emotion word can be adjective, noun or verb, ensureing adjective, noun, each 300 of verb when choosing described 900 emotion word.
For stating conveniently, below in steps in corpus U represent, sentiment dictionary D represents, emotion word ontology library E represents.
Describe in detail to the building process of described emotion word ontology library E and related experiment result below, can step be comprised:
Step 1-1) existing Chinese sentiment dictionary is integrated, comprise conventional emotion word more all sidedly;
Integration process comprises the integration of emotion word and the unification of polarity, adopts many people to vote mode correction for same emotion word in the situation that difference Chinese sentiment dictionary Semi-polarity is inconsistent.
Step 1-2) on the basis of existing extensive social networks corpus, train the network neologisms that netizen frequently uses, filter out the network sentiment neologisms possessing emotion tendency, comprise Text Pretreatment, the basic research work such as participle, part-of-speech tagging, word frequency statistics and removal stop words.Here, ICTCLAS participle instrument of new generation is adopted to carry out related work herein.ICTCLAS (Institute of Computing Technology, Chinese Lexical Analysis System) be Chinese lexical overall analysis system, major function comprises Chinese word segmentation, part-of-speech tagging, new word identification, word frequency statistics etc., and this system precision of word segmentation reaches 98.45%.The present invention completes Text Pretreatment related work by the java version interface calling ICTCLAS.Concrete steps comprise:
Step 1-2-1) Chinese word segmentation
For described language material U 1(store, Jingdone district), U 2(bean cotyledon net), U 3(popular comment net), by calling the participle interface of ICTCLAS, completes the Chinese word segmentation work of each corpus respectively.Unification is designated as U=<w for the sake of simplicity 1, w 2..., w m>, wherein M represents the word sum in word segmentation result, w krepresent a kth word in word segmentation result (k=1,2 ..., M).
Step 1-2-2) word frequency statistics
Each word w in statistics word segmentation result ithe frequency f occurred i=m i/ M, wherein m irepresent word w ithe number of times occurred.Now whole language material word bag model represents, institute's predicate bag model is expressed as (word, word frequency), then
U=<(w 1,f 1),(w 2,f 2),...,(w M,f M)>
Step 1-2-3) remove stop words
Stop words refers to that the frequency occurred in the text is very high, but the word that practical significance is little again.Stop words only puts it in a sentence and just plays certain semantic action.As common " ", "Yes", " with ", " " etc.The structure of these stop words to emotion word ontology library there is no useful effect, and what impact removes them can't produce to experimental result.Remove the language material U after stop words can be expressed as:
U=<(w 1,f 1),(w 2,f 2),...,(w N,f N)>
Wherein, N represents the word total number removed after stop words in language material, and the stop words quantity of removal is M-N.
On the basis of above-mentioned social networks corpus, collect different emoticon word set, and polarity mark has been done to it.Like this, conventional emotion word, network sentiment neologisms and emoticon word together constitute the Emotion element collection carrying out network text sentiment analysis.
Step 1-3) according to emotion word body constructing method, build emotion word ontology library.Emotion word ontology library E can be expressed as:
E=<(W 1,P 1,I 1),(W 2,P 2,I 2),...,(W n,P n,I n)>
Wherein, W irepresent emotion word, network sentiment neologisms or emoticon, P irepresent W ipolarity (P i> 0 represents that it is positive emotion word; P i< 0 represents that it is negative affect word), I irepresent W iemotion intensity level.
The polarity P of conventional emotion word iconsistent with the polarity in sentiment dictionary, the emotion intensity level I of network sentiment neologisms and expression character iwith polarity P ibe calculated as follows, to the polarity calculated and the inconsistent situation of dictionary, use the mode correction of artificial mark.
The structure of emotion word ontology library adopts formula (1) (2) (3) in summary of the invention noted earlier to carry out the screening of emotion word.Because this emotion word ontology library E obtains according to the study of large scale network text set, therefore there is the rationality of large data statistics.
Step 1-3) experimental analysis of emotion word ontology library developing algorithm, can comprise:
Step 1-3-1) experimental result
Because this experiment is the structure carrying out emotion word ontology library for raw language material, described raw language material is namely without the language material of any artificial Emotion tagging.For verifying the accuracy of this algorithm, take following measure: choose sentiment dictionary D and marked 900 emotion word of emotion intensity level as test word set TestWord=<Word i, IA i>, wherein Word irepresent emotion word, IA irepresent the emotion word intensity level of corresponding artificial mark, i ∈ 1,2 ..., 900}; According to step 1,2) the emotion word degree of membership that describes in (also i.e. emotion intensity) computing method calculate its emotion intensity level respectively, use Intensity irepresent; Calculate accuracy rate.Manually intensity level mark is carried out to 900 emotion word selected in sentiment dictionary D by adopting positive and negative 7 points of scale score methods processed:
In addition, the emotion degree of membership that this algorithm calculates has positive and negative dividing, and the polarity of the corresponding emotion word of positive negative indication, sizes values represents intensity.In order to check accuracy and the validity of this algorithm, itself and the emotion word polarity in test word set TestWord are made comparisons,
Choose the adjective in described emotion word test word set TestWord, noun, verb as test emotion word, with described three groups of corpus <U 1, U 2, U 3> is that testing material collection is tested, and experimental result is as shown in table 1:
Table 1 emotional noumenon storehouse developing algorithm experimental result
The size of the emotion intensity level of the emotion word calculated is sorted, the emotion intensity of the positive emotion word wherein calculated on the occasion of thus sort from big to small, the emotion intensity of negative affect word is negative value therefore sorts from big to small by its absolute value, calculate the front accuracy of 10,50,150,250,350,450, result of calculation is as shown in table 2:
Table 2 test words quantity is progressively increased emotion word ontology library developing algorithm accuracy rate statistics in situation
Corresponding broken line graph is made as shown in Figure 4 according to the statistics in table, no matter positive emotion word or negative emotion word as can be seen from Figure, the accuracy rate using this algorithm to calculate all levels off to about 91% along with the increase of emotion word number, this shows that the accuracy rate of emotion word ontology library developing algorithm of the present invention is probably 91%, and this also illustrates that this method has higher validity.
So far, provide the structure result of partial feeling word ontology library, as shown in table 3:
Table 3 partial feeling word ontology library
Step 1-4) emotion word ontology library supplement
In addition, the present invention considers that network individuality is delivered emoticon and some network neologisms with affective characteristics in text message on higher degree, also reflected the individual affective state to event at that time of network.Therefore, as far as possible perfect in order to make emotion word ontology library build, thus accurately calculate the mood sensing capacity of water of network individuality, the present invention adopts and constructs expression character emotional noumenon storehouse and network sentiment neologisms ontology library with the following method:
The present invention is using 1,700 ten thousand microblog data as corpus, the new word discovery of use ICTCLAS2014 software and word frequency statistics functional training go out the network neologisms totally 2182 that wherein netizen frequently uses, carry out rejecting to the word wherein without obvious emotional color and extract first 569, final selection 458 is as network sentiment neologisms.Emotional color due to the most of word in these network sentiment neologisms is that netizen in use artificially gives, the emotional color of some word is made a world of difference, some word does not just have emotional color new word as uncommon in some originally, therefore herein on the basis with reference to above described emotion word ontology library construction method, adopt the mode of artificial mark process to build network sentiment neologisms ontology library, its form is (W i, P i, I i), consistent with the emotion word ontology library above built.
In like manner, the present invention trains netizen and uses character totally 268 of expressing one's feelings more frequently from described microblogging corpus, adopts the mode of artificial mark process to construct emoticon emotional noumenon storehouse.
So far, the structure of emotion word ontology library, experimental analysis and supplementary work complete.
In the present invention, for different language materials source, the emotion word excavated and emotion intensity level thereof also can be different.Therefore this emotion word ontology library developing algorithm and application have nothing to do, can effective expanding sentiment dictionary, and enrich the use that becomes more meticulous of sentiment dictionary, the mining analysis of multi-field text data value under can be used for large data age.
Step 2) the individual mood bifurcation position of computational grid
In this step, with Sina's microblog users " Cui Yongyuan " for the individual research object of network, gather its totally 20 microbloggings delivered from May, 2013 in June, 2014 and, as an embodiment, detailed discussion is carried out to this step.
For the information aggregate of institute's collection network individuality, determine mood bifurcation position and set up the geometric layout of its dynamic change visualized graphs.As previously mentioned, the present invention is individual using Sina's microblog users " Cui Yongyuan " as one embodiment of the present of invention network.It is U that note collects text message set, then U can be expressed as:
Wherein T=20 is time series, and S is the micro-blog information delivered in the T moment, and the micro-blog information that namely t is delivered is S t.
This procedure performance step is as follows:
Step 2-1) determine in the polynary emotional semantic classification of emotion word contained by every text message and this classification dimension emotion intensity level
Use ICTCLAS2014 participle instrument to described information aggregate U temporally sequence order T carry out participle, part-of-speech tagging work, obtain the micro-blog information S that t is delivered tin word finder W t:
W t=<W 1,W 2,...,W J>
Wherein, W jfor W tin a word, whether but be that emotion word also needs further judgement, J is word number
By t micro-blog information S tin word finder W twith step 1) in the emotion word ontology library that builds match, extract W tin all emotion word and emotion intensity level.
Num trepresent W tthe emotion word number comprised in extracting, high t, med t, low tand dead trepresent Num respectively tin individual emotion word intensity level size respectively (0, a), [a, b), [b, c), [c, d] in number, wherein d=max (| I (w) |) is the maximal value of all emotion intensity absolute values in emotional noumenon storehouse, wherein separation a, b, c, d are the parameter value of corresponding mood bifurcation, and its numerical value is a=0.25d; B=0.75d; C=0.8925d.These four intervals are consistent with described emotional state and mood structure bifurcation model (shown in Figure of description 1), respectively corresponding " disappearance state [0; 1] ", " balanced state (1; 3] ", " approximate equalization state (3,3.57] " and " disorderly state (3.57,4) ".And can calculate respectively
P ( dead ) t = dead t Num t ; P ( low ) t = low t Num t ; P ( med ) t = med t Num t ; P ( high ) t = high t Num t
Wherein, P (high) t, P (med) t, P (low) t and P (dead) t represents t micro-blog information S respectively tin corresponding " the balanced state " of the high, medium and low emotion intensity level that comprises, " approximate equalization state ", " disorderly state " and " disappearance state " ratio shared by emotion word.
According to mood structure bifurcation position, definition is computational grid Emotion recognition Capability index R also t, the Emotion recognition ability of network individuality is higher, and its emotional state is more easily in " disorderly state "; Emotion recognition ability is lower, and its emotional state is easily in " equilibrium state "; Emotion recognition ability is moderate, and its emotional state is easily in " approximate equalization state ".In the illustration being described:
max{P(high) t,P(med) t,P(low) t,P(dead) t}=max{0,12/19,6/19,1/19}
Represent t micro-blog information S tin the most significant emotional state, if such as P (med) tmaximum, then represent the individual emotional state being in " approximate equalization state " in t of network, corresponding Emotion recognition ability (3,3.57] in.
The size of definition Emotion recognition ability is R t, be called Emotion recognition Capability index, its computing formula is relevant, as follows with its Emotion expression intensity in corresponding emotional state.
If P (high) tget maximum, then R t=3.57+0.43*P (high) t;
If P (med) tget maximum, then R t=3+0.57*P (med) t;
If P (low) tget maximum, then R t=1+2*P (low) t;
If P (dead) tget maximum, then R t=P (dead) t;
In this example, because P (med) tget maximum, then R t = 3 + 0.57 * P ( med ) t = 3 + 0.57 * 12 19 = 3.1219 ;
Like this, network is individual at time series T=<1, and 2,3 ..., t ..., in 20>, the change sequence of its Emotion recognition Capability index (delimiting according to mood bifurcated point value) is R=<R 1, R 2..., R t..., R 20>, then each microblogging S tthe sequential element value R of its correspondence can be calculated t, this sequence has showed the emotion cognition ability value of network individuality.
Like this, network individuality " Cui Yongyuan " at time series T=<1,2,3 ..., t ..., in 20>, its mood bifurcated value change sequence is R=<R 1, R 2..., R t..., R 20>
Step 3) carry out painted to set up geometric layout and mark label
Due to the Emotion recognition ability of individuality, be also the position of its mood bifurcation, if do not experience great emotional event in certain hour section, then there is relative stability.The affective state tendentiousness degree shown in this network text that is network individuality is delivered in long-time segment limit remains unchanged relatively.Therefore, the present invention is to above-mentioned steps 2) middle acquisition network individuality " Cui Yongyuan " Emotion recognition ability time-varying sequence <T, R> sets up geometric layout, thus can the mood bifurcation of visual network individuality more intuitively, and concrete steps can comprise:
Step 3-1) build the individual Emotion recognition Capability index position visual layout of network, take time as transverse axis, with mood bifurcated point value for the longitudinal axis.With described step 2-4) in the network that obtains individual at time series T=<1,2,3 ..., t ..., the change sequence R=<R of mood cognitive ability exponential quantity in T> 1, R 2..., R t..., R t>, in two-dimensional direct angle coordinate system with time series T for transverse axis, with Emotion recognition ability value change sequence R for the longitudinal axis builds geometric figure.Owing to the scope of Emotion recognition ability being divided into (0 in mood structure bifurcation dynamic model, 1], (1,3], (3,3.57], (3.57,4) " disappearance state ", " balanced state ", " approximate equalization state ", " the disorderly state " of corresponding individual mood, is distinguished.
R ∈ like this in coordinate system first quartile (0,1], (1,3], (3,3.57], (3.57,4) } be divided into four regions.
Step 3-2) visual presentation is carried out to the individual Emotion recognition Capability index position of network, in this step, in order to make the change procedure of individual mood bifurcation position in geometric layout have more contrast effect, when drawing geometric layout figure, label for labelling is carried out to the point in coordinate system.The attribute of this sampling point can be expressed as <t, R t, F>, wherein T represents time series, R trepresent the Emotion recognition capacity of water corresponding with time series, the label symbol selected when F represents described point.
Like this, the mood bifurcation change in location geometric layout of network individuality " Cui Yongyuan " is as shown in Figure 5:
As noted before, network individuality shows with the form of network text the cognitive result of mood event.
Fig. 5 has embodied a concentrated reflection of network individuality " Cui Yongyuan " and has carried out mood event in 19 times of cognition, the change procedure of its mood bifurcation position, corresponding Emotion recognition capacity of water appears at (0,1), (1,3], (3,3.57], (3.57,4) number of times in is respectively 0,12,6 and 1.Intuitively can find out that from figure its Emotion recognition ability R is close to 3, be in the critical localisation of " balanced state " and " approximate equalization state ", illustrate that the individual cognitive ability to mood event of this network is comparatively strong, easily produce the higher and emotional state of abundant complexity of intensity.In addition, the highest R=3.88 that occurs of its Emotion recognition ability in Fig. 5, is also shown, through checking gathered text message collection, the micro-blog information S corresponding with this time point 15for itself and the controversy of Noah's ark microblogging is fiercer time deliver, wherein there is the vocabulary that some emotion intensity levels are higher, as " notorious ", " deception ", " fraud ", " rogue " etc.
When after mood event occurs, network individuality carries out cognition with higher Emotion recognition ability to it, will produce " approximate equalization state ", the even emotional state result of " disorderly state ", and the most of emotion word intensity in the network text information that most of network individuality is delivered not is all very high, therefore its mood normality is " balanced state ", also namely its Emotion recognition ability (1,3] between.On the other hand, the Emotion recognition ability of heterogeneous networks individuality there are differences, in order to reflect the difference of the individual Emotion recognition ability of network better, using the micro-blog information set of network individuality " Liu Yifei " as another embodiment, its mood bifurcation variation diagram is drawn out as shown in Figure 6 according to described step, its Emotion recognition capacity of water appears at (0, 1), (1, 3], (3, 3.57], (3.57, 4) number of times in is respectively 0, 13, 3 and 0, intuitively, can see that its average mood sensing capacity of water is close to 2.5, be in " balanced state " interval.Situation shown in Fig. 6 often uses some intensity levels not to be very high emotion word with it and seldom uses the emotion word of some high strength relevant in the network text delivered.Through consulting its gathered microblogging text message collection, be consistent with situation described by Fig. 6.It should be noted that the individual Emotion recognition capacity variation of network understood described by Fig. 6, is also its mood bifurcation change in location process, needs emotional event theory and mood structure bifurcation models coupling to get up to consider.In view of this, user can being helped better to understand according to the network of the embodiment of the present invention individual emotion bifurcation visualized graphs, network is individual carries out cognition at different time to different emotions event and show the reason of emotional state.
Step 4) multiple network individuality is analyzed, compare its Emotion recognition Capability index change in location and its Emotion recognition Capability index difference, obtain its emotion cognition intensity index and carry out visual presentation, comprising further:
Step 4-1) determine the position of Emotion recognition Capability index relative to mood bifurcation of network individuality respectively, then determine the Emotion recognition Capability index sequence of network colony.
Individuality there are differences the cognitive ability of mood.In order to compare the Emotion recognition ability of heterogeneous networks individuality, thus show the mood bifurcation position difference of heterogeneous networks individuality, select the text message set of " Cui Yongyuan ", " Noah's ark ", " Si Manan ", " Liu Yifei " totally 4 network individualities as analysis language material, obtain its Emotion recognition ability time-varying sequence <T respectively by above-mentioned calculation procedure λ, R λ>, λ are the numberings of these network individualities.By step 3) the mood bifurcation change in location geometric layout of these network individualities can be set up respectively, but only can not compare the individual Emotion recognition ability each other of these networks according to geometric layout.
Therefore, we introduce the concept of emotion cognition intensity, and the Emotion recognition ability between coming according to this different network individualities compares.To the individual λ of network at time period T λmiddle Emotion recognition capacity of water R λbe in (0,1), (1,3], (3,3.57], the number C in (3.57,4) interval 1, C 2, C 3, C 4add up respectively, mean cluster (k=1,2,3,4) is done to all R values in its maximum corresponding interval, obtains central value as the Emotion recognition Capability index level of the individual i of network, represent its remarkable cognitive ability size to mood event.Like this, by calculating the Emotion recognition Capability index level of heterogeneous networks individuality, and network individuality being numbered, obtaining Multi net voting individual Emotion recognition Capability index video sequence wherein λ=<1,2 ..., H>, represents the numbered sequence of network individuality; R center &lambda; = < R center 1 , R center 2 , . . . , R center H > , Represent the Emotion recognition Capability index level of the individual λ of network.Described Emotion recognition Capability index level has identical meaning with the average Emotion recognition ability above described, and is used for describing the remarkable Emotion recognition capacity of water level of network individuality for a long time.
Step 4-2) build the individual Emotion recognition ability level of Multi net voting and compare visual layout;
In this step, transverse axis represents the numbering λ of network individuality, and the longitudinal axis represents the Emotion recognition Capability index level of the individual λ of network structure and the above-mentioned steps 3-1 of other geometry panels) identical.In the drawing process of geometric layout, the drafting of the Emotion recognition Capability index point that the individual numbering λ of network is corresponding with the avatar icon of this network individuality for label symbol, to show the difference in network individuality Emotion recognition ability more intuitively.In addition, with step 3-2) in identical to the division methods of the longitudinal axis, the longitudinal axis is divided into four regions according to mood bifurcation.Draw out network individual Emotion recognition Capability index level and compare visualized graphs as shown in Figure 7.
That Fig. 7 represents is network individuality " Cui Yongyuan ", " Liu Yifei ", " Si Manan ", the horizontal comparable situation of Emotion recognition Capability index of " Noah's ark ", its size is respectively 3.12,2.55,2.89 and 3.59, and what Emotion recognition ability was in higher level is " Noah's ark ", illustrates that the emotion word of high touch intensity level in its text message delivered occurs more, also further illustrate it comparatively responsive to emotional event, this is consistent with described mood structure bifurcation model.In view of this, comparing visualized graphs according to the individual Emotion recognition Capability index of the network of the embodiment of the present invention can help user to understand the otherness that the individual Emotion recognition ability of heterogeneous networks shows more intuitively.In addition, should understand, above-mentioned steps 4-2) be optional, it is to make image show more comparison information further that set up emotion visualized graphs marks label, with help user vivider analyze the difference of Emotion recognition ability between heterogeneous networks individuality.
Step 5) the individual mood bifurcation change in location process of more multiple network and Emotion recognition Capability index level, and to the cognitive process of following emotional event and result, reasonable prediction is carried out to it to it.
Described document shows because the Emotion recognition ability level of individuality to emotional event possesses certain rule, therefore the present invention's emotional event that can run into its future at the mood bifurcated value sequence of individual a greater number Network Based and the Emotion recognition Capability index determined thus level, and the affective state that emotion cognition may show is carried out to it make prediction.The Emotion recognition ability level of network individuality is higher, just more responsive to emotional event, therefore also more easily makes its emotional state be in " disorderly state "; In like manner, Emotion recognition ability level is lower, easily makes its emotional state be in " equilibrium state "; Emotion recognition ability level is moderate, and its emotional state is in " approximate equalization state ".So just can carry out reasonable prediction according to the present invention.
In sum, the present invention can allow user determine the emotion bifurcation position of particular network individuality, more can classify to the individuality in network colony tissue and contrast difference's analysis.
In order to content of the present invention and implementation method are described, this instructions gives two specific embodiments.The object introducing details is not in an embodiment the scope of restriction claims, but helps to understand the method for the invention.One skilled in the art should appreciate that: in the spirit and scope not departing from the present invention and claims thereof, to the various amendments of most preferred embodiment step, change or to replace be all possible.Therefore, the present invention should not be limited to the content disclosed in most preferred embodiment and accompanying drawing.

Claims (10)

1. network individuality or colony's Emotion recognition ability are predicted and a method for visualizing, it is characterized in that: comprise the following steps:
Step 1) build the ontology library can integrating multi-source emotion word;
Step 2) determine the individual mood bifurcation position of network, calculate its Emotion recognition Capability index sequence according to the text message collection that the network individuality of temporally sequence acquisition is delivered;
Step 3) to step 2) the Emotion recognition Capability index sequence that obtains carries out visual;
Step 4) the Emotion recognition Capability index level of multiple network individuality is analyzed.
2. a kind of network individuality according to claim 1 or colony's Emotion recognition ability are predicted and method for visualizing, it is characterized in that: described step 1) build emotion word ontology library, further comprising the steps:
Step 1-1) the conventional emotion word in existing Chinese sentiment dictionary and the network sentiment neologisms filtered out from corpus and emoticon are merged obtain Emotion element collection;
Step 1-2) to described Emotion element concentrate each word W idetermine emotion intensity I iand carry out feeling polarities mark P i;
Step 1-3) filter out emotion intensity I iexceed the word W of threshold value i, by word W iwith its feeling polarities P iwith emotion intensity I iadd emotion word ontology library E as tlv triple, obtaining E is:
E=<(W 1,P 1,I 1),(W 2,P 2,I 2),...,(W i,P i,I i),...,(W n,P n,I n)>。
3. a kind of network individuality according to claim 2 or colony's Emotion recognition ability are predicted and method for visualizing, it is characterized in that: described step 1-2) further comprising the steps:
Step 1-2-1) feeling polarities mark: the polarity P of conventional emotion word iconsistent with the polarity in sentiment dictionary, as there is the inconsistent situation of the mark of same emotion word in different emotions dictionary, use many people to vote mode correction; Network sentiment neologisms and emoticon are due to limited amount, and its polarity all adopts many people's modes of voting to determine;
Step 1-2-2) emotion intensity determines:
(1) the emotion intensity of conventional emotion word is determined: first obtain extensive social networks text set U, then calculates conventional emotion word w according to following formula *emotion intensity:
I (w *)=r (w *| S negative)-r (w *| S just)
Wherein S justand S negativebe illustrated respectively in the positive emotion word in social networks text set U and negative affect set of words, r (w *| S just) represent w *forward emotion weight, r (w *| S negative) represent w *negative sense emotion weight, emotion weight is calculated by following formula;
r ( w * | S * ) = &alpha; &Sigma; i = 1 k log P ( C i | S * ) + &beta; log P ( w * )
Wherein S *represent S justor S negative, α, β ∈ [0,1] is combination adjustment parameter, C iw *i-th word, w *in total k word, P (C i| S *) and P (w *) then calculate by following formula:
P ( C i | S * ) = P ( S * , C i ) P ( S * ) = Freq ( S * , C i ) + &delta; Freq ( S * ) + 1
Wherein Freq (S *, C i) represent belong to S *the composition word C of word ithe frequency occurred in U, Freq (S *) represent belong to S *the frequency sum that occurs in U of all composition words; δ is a less numerical value;
P ( w * ) = Freq ( w * ) + &delta; &Sigma; i = 1 | U | Freq ( w i ) + 1
Wherein Freq (w *) represent w *the frequency occurred in U, | U| represents the number of word in U, represent word w all in U ithe frequency sum occurred in U;
(2) the feeling polarities correction of conventional emotion word:
When emotion intensity I is greater than 0, represent positive emotion, feeling polarities P=+1;
When emotion intensity I is less than 0, represent negative affect, feeling polarities P=-1;
(3) network sentiment neologisms and emoticon are due to limited amount, and its emotion intensity is all adopting many people's modes of voting to determine with reference on conventional emotion word intensity basis.
4. a kind of network individuality according to claim 3 or colony's Emotion recognition ability are predicted and method for visualizing, it is characterized in that: described δ is S *the inverse of total number of word.
5. a kind of network individuality according to claim 1 or colony's Emotion recognition ability are predicted and method for visualizing, it is characterized in that: described step 2) further comprising the steps:
Step 2-1) the temporally text message set U that delivers of sequence acquisition network individuality:
Wherein T is time series, and S is the text message vector set corresponding with T, and the micro-blog information that t is delivered is S t;
Step 2-2) participle and part-of-speech tagging pre-service are carried out to described text message collection U, obtain all micro-blog information S that 1 ~ T moment delivers 1~ S tword finder W 1~ W t, the wherein micro-blog information S that delivers of t tword finder be designated as W t;
Step 2-3) by the word finder W to each micro-blog information tword in (1≤t≤T) mates with emotion word ontology library E one by one, extracts emotion word wherein and feeling polarities thereof and emotion intensity level, now w tirepresent S tin i-th emotion word comprising, Num trepresent W tin the number of emotion word that comprises;
Step 2-4) build the individual emotion bifurcation position calculation model of network and the individual network Emotion recognition Capability index value changed chronologically of computational grid, specific as follows:
Step 2-4-1) calculate S respectively by following formula tin comprise correspond to " balanced state ", " approximate equalization state ", ratio P (high) t, P (med) t, P (low) t and P (dead) t shared by emotion word of " disorderly state " and " disappearance state " four affective states:
P ( high ) t = high t Num t ; P ( med ) t = med t Num t ; P ( low ) t = low t Num t ; P ( dead ) t = dead t Num t
Wherein dead t, low t, med tand high trepresent Num respectively tin individual emotion word emotion intensity level size respectively (0, a), [a, b), [b, c), [c, d) in number, wherein d=max (| I (w) |) is the maximal value of all emotion intensity absolute values in emotion word ontology library E, wherein separation a, b, c, d is the parameter value of corresponding mood bifurcation, its numerical value is a=0.25d, b=0.75d, c=0.8925d;
Step 2-4-2) according to mood structure bifurcation position, definition is computational grid individual Emotion recognition Capability index R also t, computation process is as follows:
Calculate following formula:
max{P(high) t,P(med) t,P(low) t,P(dead) t}
If P (high) tget maximum, then R t=3.57+0.43*P (high) t;
If P (med) t, get maximum, then R t=3+0.57*P (med) t;
If P (low) t, get maximum, then R t=1+2*P (low) t;
If P (dead) t, get maximum, then R t=P (dead) t;
Step 2-4-3) to time series T=<1,2,3 ..., t ..., all micro-blog information S delivered in T> 1-S t, by step 2-4-1) and step 2-4-2) calculate its Emotion recognition Capability index sequence
R=<R 1,R 2,...,R t,...,R T>。
6. the individual or colony's Emotion recognition ability prediction of a kind of network according to claim 1 and method for visualizing, is characterized in that: step 3) describedly visually to be completed by following steps:
Step 3-1) in two-dimensional direct angle coordinate system with time series T for transverse axis, be that the longitudinal axis builds geometric figure with R, and according to R ∈ (0,1], (1,3], (3,3.57], (3.57,4] } coordinate system is divided into four regions;
Step 3-2) carry out label for labelling according to the point in network individual Emotion recognition Capability index sequence pair coordinate system, form <t, R by three attributes at each o'clock t, F>, wherein t represents the t of time series T, R trepresent the Emotion recognition Capability index of moment t, the label symbol selected when F represents described point.
7. the individual or colony's Emotion recognition ability prediction of a kind of network according to claim 1 and method for visualizing, is characterized in that: described step 4) the Emotion recognition Capability index level of multiple network individuality is analyzed is completed by following steps:
Step 4-1) calculate its Emotion recognition Capability index level according to the Emotion recognition Capability index sequence of network individuality, and then determine the Emotion recognition Capability index video sequence of network colony:
R center = < R center 1 , R cente r 2 , . . . , R c enter N >
Wherein H represents the quantity of network individuality in network colony, represent the Emotion recognition Capability index level of λ network individuality;
Step 4-2) the numbering λ of network individuality is represented with transverse axis, the longitudinal axis represents the Emotion recognition Capability index level of this individuality build Multi net voting individual Emotion recognition Capability index level and compare visual layout.
8. the individual or colony's Emotion recognition ability prediction of a kind of network stated according to claim 1 and method for visualizing, is characterized in that: described in obtained by following process: to the individual λ of network at time period T λmiddle Emotion recognition Capability index R λbe in respectively (0,1), (1,3], (3,3.57], the number C in (3.57,4) interval 1, C 2, C 3, C 4add up, to frequency C kall R values in Zone R in (k=1,2,3,4) corresponding to maximal value are averaged or median.
9. the individual or colony's Emotion recognition ability prediction of a kind of network stated according to claim 1 and method for visualizing, is characterized in that: described visual layout is completed by following steps:
Step 3-1) in two-dimensional direct angle coordinate system with the individual λ of network for transverse axis, with R centerfor the longitudinal axis builds geometric figure, and according to R center∈ (0,1], (and 1,3], (3,3.57], (3.57,4] } coordinate system is divided into four regions;
Step 3-2) according to network individual Emotion recognition Capability index video sequence, label for labelling is carried out to the point in coordinate system, within each o'clock, be made up of three attributes the label symbol that F selects when representing described point.
10. the individual or colony's Emotion recognition ability prediction of a kind of network stated according to claim 1 and method for visualizing, is characterized in that: described F is the avatar icon of network individuality or the cartoon icon of signal or symbol.
CN201410795679.8A 2014-12-18 2014-12-18 A kind of network individual or colony's Emotion recognition ability prediction and method for visualizing Expired - Fee Related CN104636425B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201410795679.8A CN104636425B (en) 2014-12-18 2014-12-18 A kind of network individual or colony's Emotion recognition ability prediction and method for visualizing

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201410795679.8A CN104636425B (en) 2014-12-18 2014-12-18 A kind of network individual or colony's Emotion recognition ability prediction and method for visualizing

Publications (2)

Publication Number Publication Date
CN104636425A true CN104636425A (en) 2015-05-20
CN104636425B CN104636425B (en) 2018-02-13

Family

ID=53215171

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201410795679.8A Expired - Fee Related CN104636425B (en) 2014-12-18 2014-12-18 A kind of network individual or colony's Emotion recognition ability prediction and method for visualizing

Country Status (1)

Country Link
CN (1) CN104636425B (en)

Cited By (18)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105022725A (en) * 2015-07-10 2015-11-04 河海大学 Text emotional tendency analysis method applied to field of financial Web
CN105138570A (en) * 2015-07-26 2015-12-09 吉林大学 Calculation method of crime degree of speech data
CN105159879A (en) * 2015-08-26 2015-12-16 北京理工大学 Automatic determination method for network individual or group values
CN105740565A (en) * 2016-02-16 2016-07-06 合肥学院 Natural language processing based automobile styling derivation method
CN105786991A (en) * 2016-02-18 2016-07-20 中国科学院自动化研究所 Chinese emotion new word recognition method and system in combination with user emotion expression ways
CN105843792A (en) * 2015-10-26 2016-08-10 北京宏博知微科技有限公司 Comprehensive emotion measuring method of internet events
CN106095777A (en) * 2016-05-26 2016-11-09 优品财富管理有限公司 The many empty sentiment indicator methods of prediction securities markets based on big data
CN106202047A (en) * 2016-07-15 2016-12-07 国家计算机网络与信息安全管理中心 A kind of character personality depicting method based on microblogging text
CN106775665A (en) * 2016-11-29 2017-05-31 竹间智能科技(上海)有限公司 The acquisition methods and device of the emotional state change information based on sentiment indicator
CN107862087A (en) * 2017-12-01 2018-03-30 广州简亦迅信息科技有限公司 Sentiment analysis method, apparatus and storage medium based on big data and deep learning
CN108388608A (en) * 2018-02-06 2018-08-10 金蝶软件(中国)有限公司 Emotion feedback method, device, computer equipment and storage medium based on text perception
CN108400810A (en) * 2018-01-31 2018-08-14 中国人民解放军陆军工程大学 Telecommunication satellite frequency resource visual management method based on temporal frequency
CN108549630A (en) * 2018-03-29 2018-09-18 西安影视数据评估中心有限公司 A kind of recognition methods of video display drama story overturning point
CN109218512A (en) * 2017-07-06 2019-01-15 新华网股份有限公司 Mobile terminal user emotion detection method, system and mobile terminal
CN109857852A (en) * 2019-01-24 2019-06-07 安徽商贸职业技术学院 A kind of the screening judgment method and system of electric business online comment training set feature
CN110083726A (en) * 2019-03-11 2019-08-02 北京比速信息科技有限公司 A kind of destination image cognitive method based on UGC image data
CN110222026A (en) * 2016-05-24 2019-09-10 甘肃百合物联科技信息有限公司 A kind of constructive memory network and the method for being used for prediction mood
CN112232197A (en) * 2020-10-15 2021-01-15 武汉微派网络科技有限公司 Juvenile identification method, device and equipment based on user behavior characteristics

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103324662A (en) * 2013-04-18 2013-09-25 中国科学院计算技术研究所 Visual method and equipment for dynamic view evolution of social media event
CN103559176A (en) * 2012-10-29 2014-02-05 中国人民解放军国防科学技术大学 Microblog emotional evolution analysis method and system
CN104216873A (en) * 2014-08-27 2014-12-17 华中师范大学 Method for analyzing network left word emotion fluctuation characteristics of emotional handicap sufferer

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103559176A (en) * 2012-10-29 2014-02-05 中国人民解放军国防科学技术大学 Microblog emotional evolution analysis method and system
CN103324662A (en) * 2013-04-18 2013-09-25 中国科学院计算技术研究所 Visual method and equipment for dynamic view evolution of social media event
CN104216873A (en) * 2014-08-27 2014-12-17 华中师范大学 Method for analyzing network left word emotion fluctuation characteristics of emotional handicap sufferer

Cited By (27)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105022725B (en) * 2015-07-10 2018-04-20 河海大学 A kind of text emotion trend analysis method applied to finance Web fields
CN105022725A (en) * 2015-07-10 2015-11-04 河海大学 Text emotional tendency analysis method applied to field of financial Web
CN105138570A (en) * 2015-07-26 2015-12-09 吉林大学 Calculation method of crime degree of speech data
CN105138570B (en) * 2015-07-26 2019-02-05 吉林大学 The doubtful crime degree calculation method of network speech data
CN105159879A (en) * 2015-08-26 2015-12-16 北京理工大学 Automatic determination method for network individual or group values
CN105843792A (en) * 2015-10-26 2016-08-10 北京宏博知微科技有限公司 Comprehensive emotion measuring method of internet events
CN105843792B (en) * 2015-10-26 2018-12-21 北京宏博知微科技有限公司 A kind of synthesis emotion measure of network event
CN105740565A (en) * 2016-02-16 2016-07-06 合肥学院 Natural language processing based automobile styling derivation method
CN105786991A (en) * 2016-02-18 2016-07-20 中国科学院自动化研究所 Chinese emotion new word recognition method and system in combination with user emotion expression ways
CN105786991B (en) * 2016-02-18 2019-03-15 中国科学院自动化研究所 In conjunction with the Chinese emotion new word identification method and system of user feeling expression way
CN110222026B (en) * 2016-05-24 2021-03-02 甘肃百合物联科技信息有限公司 Method for constructing memory network and predicting emotion by using memory network
CN110222026A (en) * 2016-05-24 2019-09-10 甘肃百合物联科技信息有限公司 A kind of constructive memory network and the method for being used for prediction mood
CN106095777A (en) * 2016-05-26 2016-11-09 优品财富管理有限公司 The many empty sentiment indicator methods of prediction securities markets based on big data
CN106202047A (en) * 2016-07-15 2016-12-07 国家计算机网络与信息安全管理中心 A kind of character personality depicting method based on microblogging text
CN106775665A (en) * 2016-11-29 2017-05-31 竹间智能科技(上海)有限公司 The acquisition methods and device of the emotional state change information based on sentiment indicator
CN109218512A (en) * 2017-07-06 2019-01-15 新华网股份有限公司 Mobile terminal user emotion detection method, system and mobile terminal
CN107862087A (en) * 2017-12-01 2018-03-30 广州简亦迅信息科技有限公司 Sentiment analysis method, apparatus and storage medium based on big data and deep learning
CN107862087B (en) * 2017-12-01 2022-02-18 深圳爱数云科技有限公司 Emotion analysis method and device based on big data and deep learning and storage medium
CN108400810A (en) * 2018-01-31 2018-08-14 中国人民解放军陆军工程大学 Telecommunication satellite frequency resource visual management method based on temporal frequency
CN108388608B (en) * 2018-02-06 2020-08-04 金蝶软件(中国)有限公司 Emotion feedback method and device based on text perception, computer equipment and storage medium
CN108388608A (en) * 2018-02-06 2018-08-10 金蝶软件(中国)有限公司 Emotion feedback method, device, computer equipment and storage medium based on text perception
CN108549630A (en) * 2018-03-29 2018-09-18 西安影视数据评估中心有限公司 A kind of recognition methods of video display drama story overturning point
CN108549630B (en) * 2018-03-29 2021-07-30 西安影视数据评估中心有限公司 Method for identifying turning points of film and television script stories
CN109857852A (en) * 2019-01-24 2019-06-07 安徽商贸职业技术学院 A kind of the screening judgment method and system of electric business online comment training set feature
CN110083726A (en) * 2019-03-11 2019-08-02 北京比速信息科技有限公司 A kind of destination image cognitive method based on UGC image data
CN110083726B (en) * 2019-03-11 2021-10-22 北京比速信息科技有限公司 Destination image perception method based on UGC picture data
CN112232197A (en) * 2020-10-15 2021-01-15 武汉微派网络科技有限公司 Juvenile identification method, device and equipment based on user behavior characteristics

Also Published As

Publication number Publication date
CN104636425B (en) 2018-02-13

Similar Documents

Publication Publication Date Title
CN104636425A (en) Method for predicting and visualizing emotion cognitive ability of network individual or group
Chen et al. What about mood swings: Identifying depression on twitter with temporal measures of emotions
CN106598944B (en) A kind of civil aviaton&#39;s security public sentiment sentiment analysis method
CN108038725A (en) A kind of electric business Customer Satisfaction for Product analysis method based on machine learning
CN106503049A (en) A kind of microblog emotional sorting technique for merging multiple affection resources based on SVM
CN112199608B (en) Social media rumor detection method based on network information propagation graph modeling
CN109829166B (en) People and host customer opinion mining method based on character-level convolutional neural network
CN105512687A (en) Emotion classification model training and textual emotion polarity analysis method and system
CN103559262A (en) Community-based author and academic paper recommending system and recommending method
CN105138577B (en) Big data based event evolution analysis method
CN110941953B (en) Automatic identification method and system for network false comments considering interpretability
CN102880631A (en) Chinese author identification method based on double-layer classification model, and device for realizing Chinese author identification method
Bosco et al. Detecting happiness in Italian tweets: Towards an evaluation dataset for sentiment analysis in Felicitta
Sadr et al. Unified topic-based semantic models: A study in computing the semantic relatedness of geographic terms
CN104965930A (en) Big data based emergency evolution analysis method
You et al. Exploring public sentiments for livable places based on a crowd-calibrated sentiment analysis mechanism
Wegrzyn-Wolska et al. Tweets mining for French presidential election
Mello et al. Towards automatic content analysis of rhetorical structure in brazilian college entrance essays
Ge et al. A Novel Chinese Domain Ontology Construction Method for Petroleum Exploration Information.
Midhunchakkaravarthy et al. A novel approach for feature fatigue analysis using HMM stemming and adaptive invasive weed optimisation with hybrid firework optimisation method
CN110110013A (en) A kind of entity competitive relation data digging method based on time-space attribute
Kumari et al. OSEMN approach for real time data analysis
Vora et al. Investigating people’s sentiment from twitter data for smart cities: A survey
Azman et al. Towards an enhanced aspect-based contradiction detection approach for online review content
Setyawan et al. Sentiment Analysis of Public Responses on Indonesia Government Using Naïve Bayes and Support Vector Machine

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20180213

Termination date: 20181218

CF01 Termination of patent right due to non-payment of annual fee