CN102411611A

CN102411611A - Instant interactive text oriented event identifying and tracking method

Info

Publication number: CN102411611A
Application number: CN201110312540XA
Authority: CN
Inventors: 田锋; 郑庆华; 张惠三
Original assignee: Xian Jiaotong University
Current assignee: Xian Jiaotong University
Priority date: 2011-10-15
Filing date: 2011-10-15
Publication date: 2012-04-11
Anticipated expiration: 2031-10-15
Also published as: CN102411611B

Abstract

The invention discloses an instant interactive text oriented event identifying and tracking method, which comprises the following two steps that: I, in the sorting stage of a turntalking theme category, the turntalking content is represented by an adaptive language feature gather representing mode and the turntalking theme category is sorted by a potential semantic analysis mode that is acquired by training with a supervision layering possibility; II, in the turntalking event identifying and tracking stage, the starting, the continuing and the ending of the event are judged according to the theme category of the turntalking, the time difference of the front and back turntalking and the compactness of the talking staff of the front and back turntalking on the social network grade, wherein (1) the invention sets forward a theory of adaptively adjusting the turntalking compactness degree valve value Th according to the fluctuation size of the time sequence data after the current turntalking, then the adaptive language feature gather calculation is executed; and (2) the potential semantic analysis mode with the supervision layering possibility is updated in certain time in the implementation. The provided method is an online identifying and tracing algorithm.

Description

A kind of event recognition and tracking towards the real-time interaction text

Technical field

The present invention relates to a kind of information retrieval, extraction and management and natural language processing technique, particularly relate to a kind of event recognition and tracking towards online real-time interaction text.

Background technology

Along with Internet technology use extensive day by day, constantly develop based on the network application of interactive text, become that people obtain and one of the main means that release news, for example typical mutual text application such as Internet chatroom, microblogging.How the information resources that containing great deal of rich in these texts realize event in these mutual text application is searched, organized and utilize by subject categories, become the task of top priority.Such as automatic recognition network learner's emotion change events, thereby regulate its learning efficiency; Discern responsive accident of various societies or new events etc.The applicant is new through looking into, and does not retrieve the patent that the present invention is correlated with.But look for several pieces of similar articles, be respectively:

1) based on the Message-text cluster research of frequent mode.Hu Jixiang, Postgraduate School, Chinese Academy of Sciences's (Institute of Computing Technology).

2) be used to the to chat weighing computation method CDTF_IDF of vocabulary.Gao Peng, Cao Xianbin, Computer Simulation, 2007.12.

Article 1) author found frequent mode (being referred to as crucial frequent mode) comprised more semantic informations such as word order and contiguous context to mutual text feature extract key; Proposed a kind of guideless feature selecting algorithm, be applied to text classification and cluster based on frequent mode.

Article 2) contents supervision who is primarily aimed at the chatroom uses; Through calculated off-line vocabulary respectively in the different pieces of information source weights and gather and emphasis vocabulary improved mode such as weight and calculate the vocabulary weight of chat data, thereby reach the purpose of identification chatroom theme.

Look into newly according to above-mentioned, existing similar technique mainly contains the different of following several respects with the inventive method:

1. the research object of prior art is with whole news (incident) or paragraph, and this method is to words wheel rank.

2. prior art is the off-line subject identifying method, and this method is online event recognition method.

3. the result of prior art identification is merely whole news (incident) or which kind of theme whether paragraph belong to, and relevant news (incident) takes place, i.e. the recognition and tracking of subject matter level; And this method mainly is to find whether the incident that the online interaction both sides discuss is consistent, this incident whether complete (beginning and finish), and the people of participation has those, promptly to recognition and tracking single, concrete incident.

4. aspect the character representation of mutual text, the words-frequency feature that the prior art collected offline is merely current news (incident) calculates, and this method has been found the time-dependent characteristic, introduces the gathering of all the words wheel characteristics in the time threshold and carries out subject classification.

5. existing method is main with unsupervised probability latent semantic analysis method, and this method is to the hierarchical model of theme, proposed to have supervision, layering PLSA topic model training method, and regularly upgrade topic model.

Summary of the invention

To existing problem among aforementioned related art and the present invention relatively, the invention provides a kind of event recognition and tracking towards online real-time interaction text, comprise the steps:

The first step: words wheel level subject categories sorting phase:

(1) in the real-time interaction text, the speech Speech that once imports with the user is a words wheel Turn, is expressed as with five-tuple:

T _i＝(i，id，role，stamp，content)

Wherein, T _iRepresent i words wheel, and i ∈ Z, Z is the positive integer set; Id representes to distinguish unique indications of speaker; Role representes speaker's role, and it divides two classifications: speaker Speaker and recipient recipient; Stamp representes to talk about the timestamp that wheel takes place; Content representes once to talk about all texts of making a speech in the wheel;

T so _i.stamp just represent the time that i words wheel takes place, T _i.content the content of just representing i words wheel, described mutual text are to come from same chatroom or the words wheel in the group is discussed;

(2) to current words wheel T _iContent T _i.content carry out the text pre-service, extract characteristic speech wherein, computational language proper vector according to feature lexicon

W wherein _Ih, 0＜h≤n representes that h characteristic speech is at T _i.content the number of times that occurs in, the number of n representation feature speech; Described feature lexicon extracts from training data;

(3) if words wheel T _iBeing the wheel of words first that occurs in the system, also is T ₁, forward (5) to; Otherwise, carry out (4);

(4) calculate words wheel T _iThe self-adaptation language feature assemble vector

wherein

0＜h '≤n representes the number of times that the individual characteristic speech of h ' occurs, the number of n representation feature speech in this language feature is assembled;

(5) utilization has supervision layering probability latent semantic analysis model to talk about the classification of wheel level subject categories;

Second step, words wheel level event recognition and tracking phase:

(1) according to subject categories under the words wheel, mistiming that front and back words wheel takes place and the front and back tightness of words wheel speaker on the community network level are judged current words wheel T _iWhether be beginning, continuity and the end of incident;

(2) if words wheel T _iBe the incident end statement, just formed a complete incident, so mark T _iBe the words wheel of End Event, otherwise be labeled as the not words wheel of End Event;

(3) judge whether to arrive the regular update time; If arrive, then to there being supervision layering probability latent semantic analysis model to carry out model modification; Otherwise, finishing algorithm, described regular update is meant that the complete event that will newly discern each the end of month joins in the training set, trains model again;

The computation process that the described self-adaptation language feature of the step of the first step (4) is assembled vector is:

Step1: calculate current words wheel T _iAfter the generation, at the time interval [T _i.stamp-Δ T, T _i.stamp] the frequency V (T that the words wheel takes place in _i):

V (T_{i}) = \{\begin{matrix} \frac{C (T_{1} . stamp, T_{i} . stamp)}{ΔT}, & T_{i} . stamp - T_{1} . stamp < ΔT \\ \frac{C (T_{i} . stamp - ΔT, T_{i} . stamp)}{ΔT}, & T_{i} . stamp - T_{1} . stamp &GreaterEqual; ΔT \end{matrix}

Wherein, C (T ₁.stamp, T _i.stamp) be illustrated in the time interval [T ₁.stamp, T _i.stamp] the words wheel number of times that takes place altogether in, C (T _i.stamp-Δ T, T _i.stamp) be illustrated in the time interval [T _i.stamp-Δ T, T _i.stamp] the words wheel number of times that takes place altogether in, Δ T be a regular time at interval, initialization Δ T=1 hour;

Step2: the size of the adaptive density threshold Th that confirms to be pressed for time: calculate Th ' earlier, that is:

{Th}^{'} = \{\begin{matrix} (1 - Δv) \times Th, & V (T_{i}) &GreaterEqual; (1 + Δv) \times V (T_{i - 1}) \\ Th, & (1 - Δv) \times V (T_{i - 1}) < V (T_{i}) < (1 + Δv) \times V (T_{i - 1}) \\ (1 + Δv) \times Th, & V (T_{i}) \leq (1 - Δv) \times V (T_{i - 1}) \end{matrix}

Th=Th ' then, promptly update time threshold value, wherein during initialization, Δ v is set to 0.3, threshold value Th=6 hour, utilizes above thought to reach the purpose of adaptive change time threshold Th size;

Step3: order

Express time is [T at interval _i.stamp-Th, T _i.stamp] the words wheel set that takes place in, T so _iLanguage feature assemble vector and just do

In the language feature vector sum of all words wheels, that is:

The step (3) in the step of the first step (5), second step is described to have supervision layering probability latent semantic analysis model, and its training process is following:

Step1, according to the hierarchical nature of theme to training data set carrying out hierarchical classification tissue, what form after the theme layering is a tree structure, the note work:

M_{k}^{level} = {M_{1}^{level + 1}, M_{2}^{level + 1}, . . ., M_{a_{k}}^{level + 1} | a_{k} &Element; Z}

Wherein level representes the residing level of subject categories, and k representes current theme

Be the k sub-topic classification that belongs in the last layer theme, a _kRepresent current theme

If the sub-topics number that is comprised is a _k=0, so

Be exactly the leafy node of theme, be designated as

Otherwise

The mother node that comprises sub-topics exactly, note is done

Described mon_topics is meant the node set that includes sub-topics, and leaf_topics is meant the leafy node set; When level=0, note

The subject categories of expression top layer, wherein a ₀Expression top layer subject categories number;

So, the organizational process of training data is following:

Step1.1, generating feature term vector W, process is following:

The sum of the independent speech that occurs in Step1.1.1, the set of statistics training data; Form wherein f number of times that the characteristic speech occurs of expression of a characteristic term vector

after the deletion stop words in training data, the number of

representation feature speech; Described stop words comprises: symbol, auxiliary word, preposition, conjunction, interjection, onomatopoeia, number;

Step1.1.2, utilize the TFIDF algorithm right

Carry out the calculating of characteristic speech weight, and sort by weight is descending, it is W={w that the deletion weight obtains the characteristic term vector after less than 0.1 characteristic speech ₁, w ₂... w _{F '}..., w _n, w wherein _{F '}Represent the individual characteristic speech of f ' weight size in training data, the number of n representation feature speech;

Step1.2, generation co-occurrence matrix N, process is following:

Step1.2.1, with all belong to theme in the training data

Document form a collection of document

M wherein _kExpression document number;

Step1.2.2, the dimension size is n * m so _kThe co-occurrence matrix N of speech and document just is: N=(c (w _r, d _s)) _Rs, wherein, c (w _r, d _s) r number of times that the characteristic speech occurs in s document of expression;

Step2, in the set of this training dataset, according to top-down mode, successively train corresponding probability latent semantic analysis model, process is following:

Step2.1, utilize the TFIDF algorithm that the element of matrix N is carried out the calculating of weight, generate a new co-occurrence matrix

Figure 201110312540X100002DEST_PATH_FDA0000118310880000038

;

Step2.2, utilize probability latent semantic analysis algorithm to co-occurrence matrix

Figure 82100DEST_PATH_FDA0000118310880000038

Learn, generating size is the WZ=(p (w of n * Q _r, z _q)) _RqAnd size is Q * m _kDZ=(p (z _q, d _s)) _QsTwo matrixes, wherein z _q∈ Z=(z ₁, z ₂..., z _Q), Z representes latent semantic space, Q representes the size of latent semantic space; P (w _r, z _q) represent that r characteristic speech is at potential semantic z _qOn probability size; P (z _q, d _s) represent that s document is at potential semantic z _qOn probability size;

Step3, utilize multi-class support vector machine SVM (Support Vector Machine) sorter respectively the corresponding DZ of the probability latent semantic analysis model of each layer training gained to be trained, what generate each layer correspondence has a supervision probability latent semantic analysis category of model device

When level=0, sorter is M ⁰

In the said method, the process that the step of the first step (5) utilization has supervision layering probability latent semantic analysis model to talk about the classification of wheel level subject categories is:

Step1: calculate current words wheel T _iLanguage feature assemble vector The WZ that utilization has the study of supervision layering probability latent semantic analysis algorithm to obtain will Be mapped on the latent semantic space Z, just utilize latent semantic space Z to represent T _iThe language feature content of assembling, that is:

Represent current words wheel T _iLanguage feature accumulate in the probability distribution on the latent semantic space Z, that is to say characteristic term vector W is carried out the characteristic dimensionality reduction;

Step2: that utilizes that training obtains has a supervision layering probability latent semantic analysis category of model device M ⁰To T _iCarry out the subject categories classification;

Step3: if T _iSubject categories belong to mon_topics, level increases by 1 so, forwards Step4 to; Otherwise, mark T _iSubject categories be the subject categories of being discerned, finish;

Step4: utilize corresponding

To T _iCarry out the subject categories classification, forward Step3 to.

The detailed process of step (1) is following in described second step:

Step1: search and obtain [T _i.stamp-Th, T _i.stamp] generation and words wheel set that be not the incident end in the time interval

Step2: if

Only contain element T _i, mark T so _iBe the initial sentence of a new incident, algorithm finishes; Otherwise, make l=i-1, carry out Step3;

Step3: judge T _iWith T _lSubject categories whether identical;

Step4: if T _iWith T _lSubject categories identical, so with T _iBelong to T _lIn the affiliated incident, algorithm finishes; Otherwise make l=l-1, carry out Step5;

Step5: if l >=g so, forwards Step3 to; Otherwise, forward Step6 to;

Step6: if T _iAffiliated incident be empty, make l '=i-1 so, forward Step7 to; Otherwise, finish algorithm;

Step7: calculate T _i.id with T _{L '}.id the tightness d on the community network level;

Step8: if d＞0.5, so with T _iBelong to T _{L '}In the affiliated incident, algorithm finishes; Otherwise make l '=l '-1, carry out Step9;

Step9: if l '>=g so, forwards Step7 to; Otherwise, mark T _iBe the initial sentence of a new events, finish algorithm.

The computing method of described community network tightness are:

d (T_{i} . id, T_{i - 1} . id) = \frac{IO (T_{i} . id, T_{i - 1} . id)}{I (T_{i} . id) + O (T_{i} . id) + I (T_{i - 1} . id) + O (T_{i - 1} . id)}

I (T wherein _i.id) expression T _i.id in-degree sum, O (T _i.id) expression T _i.id out-degree sum, T _I-1.id similar; IO (T _i.id, T _I-1.id) expression T _i.id to T _I-1.id number of times and T talk _I-1.id to T _iThe number of times sum of .id speaking, the statistics of out-degree, in-degree is the summation of historical data, the tightness of community network was upgraded once in every month.

Description of drawings

Fig. 1 event recognition of the present invention and trace flow figure.

Fig. 2 incremental training process flow diagram flow chart.

Fig. 3 talks about wheel subject categories classification process figure.

Fig. 4 talks about wheel level event recognition and following principle figure.

The example that Fig. 5 community network tightness is calculated.Wherein Fig. 5 a is a raw-data map, and Fig. 5 b is the digraph after transforming.

Embodiment

Understand the present invention for clearer, the present invention is made further detailed description below in conjunction with accompanying drawing.

1, the present invention adopts is that the identification of advanced jargon wheel subject categories is talked about wheel level event recognition again and come the words that user in the real-time interaction text imports are carried out event recognition and tracking with the mechanism of tracking, and its process flow diagram is as shown in Figure 1.

Research purpose: the Turn of user's input is belonged in the events corresponding.

The research background: the real-time interaction text is than single piece of documents such as blog, comment, novels, and it also has own unique language characteristic except that having inherited characteristics such as ambiguousness that natural language text has and non-standard property:

(1) interactivity;

(2) time series characteristic, great majority are close to real-time, interactive, and the theme between the words wheel has time dependence, and promptly with the near more talk of theme time of occurrence, the possibility that both are correlated with is bigger;

(3) content of each words wheel is few, and sentence is short, will inevitably cause characteristic sparse like this;

(4) interactive mode is complicated, for example one to one, the interactive mode of one-to-many, multi-to-multi;

(5) language performance is various informative in the mutual text, and language performance is succinct, and misspelling, term lack of standardization and noise are a lot.

These have brought bigger challenge all for the treatment technology of mutual text.

To the kind specific character that mutual text self has, the strategy of a kind of " two steps were walked " has been proposed.Concrete working mechanism is following:

The first step is carried out the classification of subject categories, in this step, identifies the affiliated subject categories of each words wheel, as: politics, economy, culture, education, science and technology etc., process is following:

T _i＝(i，id，role，stamp，content)

wherein

Second step, words wheel level event recognition and tracking phase:

(3) judge whether to arrive the regular update time; If arrive, then to there being supervision layering probability latent semantic analysis model to carry out model modification; Otherwise, finishing algorithm, described regular update is meant that the complete event that will newly discern each the end of month joins in the training set, trains model, referring to accompanying drawing 2 again.

2, the subject categories classification mechanism of words wheel

The research purpose: those words wheels that will be relevant with current words wheel flock together, and utilize the language feature of words wheel to assemble the language feature vector of vector as current words wheel.

The research background: the content of each words wheel is few in the mutual text, and sentence is short, will inevitably cause characteristic sparse like this, assembles vector through the language feature that calculates current words wheel so, and it is few to overcome content to a certain extent, short these shortcomings of sentence.

Subject categories classification the present invention of dialogue wheel has adopted a kind of adaptive subject categories classification mechanism, and its process flow diagram is as shown in Figure 3, and concrete working mechanism is following:

(1) calculates current words wheel T _iLanguage feature assemble vector, process is following

V (T_{i}) = \{\begin{matrix} \frac{C (T_{1} . stamp, T_{i} . stamp)}{ΔT}, & T_{i} . stamp - T_{1} . stamp < ΔT \\ \frac{C (T_{i} . stamp - ΔT, T_{i} . stamp)}{ΔT}, & T_{i} . stamp - T_{1} . stamp &GreaterEqual; ΔT \end{matrix}

{Th}^{'} = \{\begin{matrix} (1 - Δv) \times Th, & V (T_{i}) &GreaterEqual; (1 + Δv) \times V (T_{i - 1}) \\ Th, & (1 - Δv) \times V (T_{i - 1}) < V (T_{i}) < (1 + Δv) \times V (T_{i - 1}) \\ (1 + Δv) \times Th, & V (T_{i}) \leq (1 - Δv) \times V (T_{i - 1}) \end{matrix}

Step3: order

In the language feature vector sum of all words wheels, that is:

(2) training has supervision layering probability latent semantic analysis model

Research purpose: training data is carried out laminated tissue and training according to the layering theme.

The research background: there is hierarchical nature in theme, and these exist in real world applications in a large number, classifies with the subject of the Ministry of Education during for example book classification is learned., near fine-grained core event training data is organized and trained according to this model from abstract theme, can effectively solve the nonequilibrium behavior of data.

Detailed process is following

M_{k}^{level} = {M_{1}^{level + 1}, M_{2}^{level + 1}, . . ., M_{a_{k}}^{level + 1} | a_{k} &Element; Z}

Be the k sub-topic classification that belongs in the last layer theme, a _kRepresent current theme If the sub-topics number that is comprised is a _k=0, so

Be exactly the leafy node of theme, be designated as

Otherwise

The mother node that comprises sub-topics exactly, note is done

So, the organizational process of training data is following:

Step1.1, generating feature term vector W, process is following:

The sum of the independent speech that occurs in Step1.1.1, the set of statistics training data; Form wherein f number of times that the characteristic speech occurs of

expression of a characteristic term vector

after the deletion stop words in training data, the number of

Step1.1.2, utilize the TFIDF algorithm right

Step1.2, generation co-occurrence matrix N, process is following:

Step1.2.1, with all belong to theme in the training data Document form a collection of document

M wherein _kExpression document number;

Probability latent semantic analysis model is a kind of topic model, and its principle is following:

Make D={d ₁, d ₂..., d _mThe expression collection of document, W={w ₁, w ₂..., w _nThe representation feature set of words, wherein, m representes the document number, n representation feature speech number is ignored the order that speech takes place in document, so can generate the co-occurrence matrix N=(c (w of a m * n _rd _s) _Rs, c (w here _r, d _s) r number of times that the characteristic speech occurs in s document of expression; Definition joint density model is:

p(d，w)＝p(d)p(w|d)，

p (w | d) = \underset{z &Element; Z}{Σ} p (w | z) p (z | d)

Z ∈ Z=(z wherein ₁, z ₂..., z _Q) be latent semantic space, Q is the size of latent semantic space;

Being interpreted as of model so:

The probability that p (d) expression document occurs in data centralization; P (w|z) expression is when having confirmed semantic z; The probability that relevant speech w occurs is respectively much, and semantic distribution situation in document of p (z|d) expression is utilized above these definition; Just can form a generation model, utilize it to produce new data:

(1) at first selects a document d according to distribution p (d) random sampling;

(2) behind the selected document, the semantic z that sampling selects document to express according to p (z|d);

(3) behind the selected semanteme, select the speech of document according to p (w|z);

Process according to above theoretical training probability latent semantic analysis model topic model is:

Step2.1, utilize the TFIDF algorithm that the element of matrix N is carried out the calculating of weight, generate a new co-occurrence matrix ;

Figure 817898DEST_PATH_FDA0000118310880000038

When level=0, sorter is M ⁰What adopt in the experiment is the LIBSVM multicategory classification device that professor Lin Zhiren of Taiwan Univ. writes.

(3) hierarchical classification mechanism

The research purpose: utilizing has supervision layering probability latent semantic analysis model to talk with wheel to classify

The research background: the dialogue wheel carries out hierarchical classification, proceeds classification so if the classification of words wheel belongs to the mother node theme; Otherwise stop the subject categories of mark words wheel.The hierarchical classification process is following:

Step1: calculate current words wheel T _iLanguage feature assemble vector

The WZ that utilization has the study of supervision layering probability latent semantic analysis algorithm to obtain will

Be mapped on the latent semantic space Z, just utilize latent semantic space Z to represent T _iThe language feature content of assembling, that is:

Step4: utilize corresponding

To T _iCarry out the subject categories classification, forward Step3 to.

3, the identification and the splicing mechanism of words wheel level incident

Research purpose: will talk about wheel and belong in the events corresponding.

Research background: according to subject categories under the words wheel; Recognition and tracking the present invention of concrete incident adopted a kind ofly combined to talk about mistiming and the front and back tightness of words wheel speaker on the community network level that subject categories under the wheel, front and back words wheel take place and come beginning, the continuity of decision event and finish machine-processed; Its principle is as shown in Figure 4, and concrete working mechanism is following:

Step2: if

Step3: judge T _iWith T _lSubject categories whether identical;

Step5: if l >=g so, forwards Step3 to; Otherwise, forward Step6 to;

The computing method of described community network tightness are:

d (T_{i} . id, T_{i - 1} . id) = \frac{IO (T_{i} . id, T_{i - 1} . id)}{I (T_{i} . id) + O (T_{i} . id) + I (T_{i - 1} . id) + O (T_{i - 1} . id)}

For the ease of better understanding the computing method of community network tightness, give one example here and set forth, referring to accompanying drawing 5, raw data is converted into digraph, the tightness of A that calculates so and the community network of B is:

d (A, B) = \frac{5}{5 + 5 + 3 + 4} = 0.294

Claims

1. event recognition and tracking towards a real-time interaction text is characterized in that: comprise the steps:

The first step: words wheel level subject categories sorting phase:

T _i＝(i，id，role，stamp，content)

wherein

Second step, words wheel level event recognition and tracking phase:

Step3: order

Express time is [T at interval _i.stamp-Th, T _i.stamp] the words wheel set that takes place in, T so _iLanguage feature assemble vector and just do In the language feature vector sum of all words wheels, that is:

Be exactly the leafy node of theme, be designated as Otherwise

The mother node that comprises sub-topics exactly, note is done

So, the organizational process of training data is following:

Step1.1, generating feature term vector W, process is following:

after the deletion stop words in training data, the number of

Step1.1.2, utilize the TFIDF algorithm right

Step1.2, generation co-occurrence matrix N, process is following:

Step1.2.1, with all belong to theme in the training data

Document form a collection of document

M wherein _kExpression document number;

;

Figure 437878DEST_PATH_FDA0000118310880000038

When level=0, sorter is M ⁰

2. a kind of event recognition and tracking towards the real-time interaction text as claimed in claim 1 is characterized in that: the process that step in the first step (5) utilization has supervision layering probability latent semantic analysis model to talk about the classification of wheel level subject categories is:

Step1: calculate current words wheel T _iLanguage feature assemble vector

The WZ that utilization has the study of supervision layering probability latent semantic analysis algorithm to obtain will Be mapped on the latent semantic space Z, just utilize latent semantic space Z to represent T _iThe language feature content of assembling, that is:

Step4: utilize corresponding

To T _iCarry out the subject categories classification, forward Step3 to.

3. a kind of event recognition and tracking towards the real-time interaction text as claimed in claim 1 is characterized in that: the detailed process of step (1) is following in described second step:

Step2: if Only contain element T _i, mark T so _iBe the initial sentence of a new incident, algorithm finishes; Otherwise, make l=i-1, carry out Step3;

Step3: judge T _iWith T _lSubject categories whether identical;

Step5: if l >=g so, forwards Step3 to; Otherwise, forward Step6 to;

Step7: calculate T _i.id with T _lThe tightness d of ' .id on the community network level;

4. a kind of event recognition and tracking as claimed in claim 3 towards the real-time interaction text, it is characterized in that: the computing method of described community network tightness are: