CN102411611A - Instant interactive text oriented event identifying and tracking method - Google Patents

Instant interactive text oriented event identifying and tracking method Download PDF

Info

Publication number
CN102411611A
CN102411611A CN201110312540XA CN201110312540A CN102411611A CN 102411611 A CN102411611 A CN 102411611A CN 201110312540X A CN201110312540X A CN 201110312540XA CN 201110312540 A CN201110312540 A CN 201110312540A CN 102411611 A CN102411611 A CN 102411611A
Authority
CN
China
Prior art keywords
wheel
stamp
words
speech
words wheel
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201110312540XA
Other languages
Chinese (zh)
Other versions
CN102411611B (en
Inventor
田锋
郑庆华
张惠三
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Xian Jiaotong University
Original Assignee
Xian Jiaotong University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Xian Jiaotong University filed Critical Xian Jiaotong University
Priority to CN 201110312540 priority Critical patent/CN102411611B/en
Publication of CN102411611A publication Critical patent/CN102411611A/en
Application granted granted Critical
Publication of CN102411611B publication Critical patent/CN102411611B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Landscapes

  • Machine Translation (AREA)

Abstract

The invention discloses an instant interactive text oriented event identifying and tracking method, which comprises the following two steps that: I, in the sorting stage of a turntalking theme category, the turntalking content is represented by an adaptive language feature gather representing mode and the turntalking theme category is sorted by a potential semantic analysis mode that is acquired by training with a supervision layering possibility; II, in the turntalking event identifying and tracking stage, the starting, the continuing and the ending of the event are judged according to the theme category of the turntalking, the time difference of the front and back turntalking and the compactness of the talking staff of the front and back turntalking on the social network grade, wherein (1) the invention sets forward a theory of adaptively adjusting the turntalking compactness degree valve value Th according to the fluctuation size of the time sequence data after the current turntalking, then the adaptive language feature gather calculation is executed; and (2) the potential semantic analysis mode with the supervision layering possibility is updated in certain time in the implementation. The provided method is an online identifying and tracing algorithm.

Description

A kind of event recognition and tracking towards the real-time interaction text
Technical field
The present invention relates to a kind of information retrieval, extraction and management and natural language processing technique, particularly relate to a kind of event recognition and tracking towards online real-time interaction text.
Background technology
Along with Internet technology use extensive day by day, constantly develop based on the network application of interactive text, become that people obtain and one of the main means that release news, for example typical mutual text application such as Internet chatroom, microblogging.How the information resources that containing great deal of rich in these texts realize event in these mutual text application is searched, organized and utilize by subject categories, become the task of top priority.Such as automatic recognition network learner's emotion change events, thereby regulate its learning efficiency; Discern responsive accident of various societies or new events etc.The applicant is new through looking into, and does not retrieve the patent that the present invention is correlated with.But look for several pieces of similar articles, be respectively:
1) based on the Message-text cluster research of frequent mode.Hu Jixiang, Postgraduate School, Chinese Academy of Sciences's (Institute of Computing Technology).
2) be used to the to chat weighing computation method CDTF_IDF of vocabulary.Gao Peng, Cao Xianbin, Computer Simulation, 2007.12.
Article 1) author found frequent mode (being referred to as crucial frequent mode) comprised more semantic informations such as word order and contiguous context to mutual text feature extract key; Proposed a kind of guideless feature selecting algorithm, be applied to text classification and cluster based on frequent mode.
Article 2) contents supervision who is primarily aimed at the chatroom uses; Through calculated off-line vocabulary respectively in the different pieces of information source weights and gather and emphasis vocabulary improved mode such as weight and calculate the vocabulary weight of chat data, thereby reach the purpose of identification chatroom theme.
Look into newly according to above-mentioned, existing similar technique mainly contains the different of following several respects with the inventive method:
1. the research object of prior art is with whole news (incident) or paragraph, and this method is to words wheel rank.
2. prior art is the off-line subject identifying method, and this method is online event recognition method.
3. the result of prior art identification is merely whole news (incident) or which kind of theme whether paragraph belong to, and relevant news (incident) takes place, i.e. the recognition and tracking of subject matter level; And this method mainly is to find whether the incident that the online interaction both sides discuss is consistent, this incident whether complete (beginning and finish), and the people of participation has those, promptly to recognition and tracking single, concrete incident.
4. aspect the character representation of mutual text, the words-frequency feature that the prior art collected offline is merely current news (incident) calculates, and this method has been found the time-dependent characteristic, introduces the gathering of all the words wheel characteristics in the time threshold and carries out subject classification.
5. existing method is main with unsupervised probability latent semantic analysis method, and this method is to the hierarchical model of theme, proposed to have supervision, layering PLSA topic model training method, and regularly upgrade topic model.
Summary of the invention
To existing problem among aforementioned related art and the present invention relatively, the invention provides a kind of event recognition and tracking towards online real-time interaction text, comprise the steps:
The first step: words wheel level subject categories sorting phase:
(1) in the real-time interaction text, the speech Speech that once imports with the user is a words wheel Turn, is expressed as with five-tuple:
T i=(i,id,role,stamp,content)
Wherein, T iRepresent i words wheel, and i ∈ Z, Z is the positive integer set; Id representes to distinguish unique indications of speaker; Role representes speaker's role, and it divides two classifications: speaker Speaker and recipient recipient; Stamp representes to talk about the timestamp that wheel takes place; Content representes once to talk about all texts of making a speech in the wheel;
T so i.stamp just represent the time that i words wheel takes place, T i.content the content of just representing i words wheel, described mutual text are to come from same chatroom or the words wheel in the group is discussed;
(2) to current words wheel T iContent T i.content carry out the text pre-service, extract characteristic speech wherein, computational language proper vector according to feature lexicon
Figure BDA0000099002030000021
W wherein Ih, 0<h≤n representes that h characteristic speech is at T i.content the number of times that occurs in, the number of n representation feature speech; Described feature lexicon extracts from training data;
(3) if words wheel T iBeing the wheel of words first that occurs in the system, also is T 1, forward (5) to; Otherwise, carry out (4);
(4) calculate words wheel T iThe self-adaptation language feature assemble vector
Figure BDA0000099002030000022
wherein
Figure BDA0000099002030000023
0<h '≤n representes the number of times that the individual characteristic speech of h ' occurs, the number of n representation feature speech in this language feature is assembled;
(5) utilization has supervision layering probability latent semantic analysis model to talk about the classification of wheel level subject categories;
Second step, words wheel level event recognition and tracking phase:
(1) according to subject categories under the words wheel, mistiming that front and back words wheel takes place and the front and back tightness of words wheel speaker on the community network level are judged current words wheel T iWhether be beginning, continuity and the end of incident;
(2) if words wheel T iBe the incident end statement, just formed a complete incident, so mark T iBe the words wheel of End Event, otherwise be labeled as the not words wheel of End Event;
(3) judge whether to arrive the regular update time; If arrive, then to there being supervision layering probability latent semantic analysis model to carry out model modification; Otherwise, finishing algorithm, described regular update is meant that the complete event that will newly discern each the end of month joins in the training set, trains model again;
The computation process that the described self-adaptation language feature of the step of the first step (4) is assembled vector is:
Step1: calculate current words wheel T iAfter the generation, at the time interval [T i.stamp-Δ T, T i.stamp] the frequency V (T that the words wheel takes place in i):
V ( T i ) = C ( T 1 . stamp , T i . stamp ) &Delta;T , T i . stamp - T 1 . stamp < &Delta;T C ( T i . stamp - &Delta;T , T i . stamp ) &Delta;T , T i . stamp - T 1 . stamp &GreaterEqual; &Delta;T
Wherein, C (T 1.stamp, T i.stamp) be illustrated in the time interval [T 1.stamp, T i.stamp] the words wheel number of times that takes place altogether in, C (T i.stamp-Δ T, T i.stamp) be illustrated in the time interval [T i.stamp-Δ T, T i.stamp] the words wheel number of times that takes place altogether in, Δ T be a regular time at interval, initialization Δ T=1 hour;
Step2: the size of the adaptive density threshold Th that confirms to be pressed for time: calculate Th ' earlier, that is:
Th &prime; = ( 1 - &Delta;v ) &times; Th , V ( T i ) &GreaterEqual; ( 1 + &Delta;v ) &times; V ( T i - 1 ) Th , ( 1 - &Delta;v ) &times; V ( T i - 1 ) < V ( T i ) < ( 1 + &Delta;v ) &times; V ( T i - 1 ) ( 1 + &Delta;v ) &times; Th , V ( T i ) &le; ( 1 - &Delta;v ) &times; V ( T i - 1 )
Th=Th ' then, promptly update time threshold value, wherein during initialization, Δ v is set to 0.3, threshold value Th=6 hour, utilizes above thought to reach the purpose of adaptive change time threshold Th size;
Step3: order
Figure BDA0000099002030000033
Express time is [T at interval i.stamp-Th, T i.stamp] the words wheel set that takes place in, T so iLanguage feature assemble vector and just do
Figure BDA0000099002030000034
In the language feature vector sum of all words wheels, that is:
Figure BDA0000099002030000035
The step (3) in the step of the first step (5), second step is described to have supervision layering probability latent semantic analysis model, and its training process is following:
Step1, according to the hierarchical nature of theme to training data set carrying out hierarchical classification tissue, what form after the theme layering is a tree structure, the note work:
M k level = { M 1 level + 1 , M 2 level + 1 , . . . , M a k level + 1 | a k &Element; Z }
Wherein level representes the residing level of subject categories, and k representes current theme
Figure BDA0000099002030000042
Be the k sub-topic classification that belongs in the last layer theme, a kRepresent current theme
Figure BDA0000099002030000043
If the sub-topics number that is comprised is a k=0, so
Figure BDA0000099002030000044
Be exactly the leafy node of theme, be designated as
Figure BDA0000099002030000045
Otherwise
Figure BDA0000099002030000046
The mother node that comprises sub-topics exactly, note is done
Figure BDA0000099002030000047
Described mon_topics is meant the node set that includes sub-topics, and leaf_topics is meant the leafy node set; When level=0, note
Figure BDA0000099002030000048
The subject categories of expression top layer, wherein a 0Expression top layer subject categories number;
So, the organizational process of training data is following:
Step1.1, generating feature term vector W, process is following:
The sum of the independent speech that occurs in Step1.1.1, the set of statistics training data; Form wherein f number of times that the characteristic speech occurs of expression of a characteristic term vector
Figure BDA0000099002030000049
after the deletion stop words in training data, the number of
Figure BDA00000990020300000411
representation feature speech; Described stop words comprises: symbol, auxiliary word, preposition, conjunction, interjection, onomatopoeia, number;
Step1.1.2, utilize the TFIDF algorithm right
Figure BDA00000990020300000412
Carry out the calculating of characteristic speech weight, and sort by weight is descending, it is W={w that the deletion weight obtains the characteristic term vector after less than 0.1 characteristic speech 1, w 2... w F '..., w n, w wherein F 'Represent the individual characteristic speech of f ' weight size in training data, the number of n representation feature speech;
Step1.2, generation co-occurrence matrix N, process is following:
Step1.2.1, with all belong to theme in the training data
Figure BDA00000990020300000413
Document form a collection of document
Figure BDA00000990020300000414
M wherein kExpression document number;
Step1.2.2, the dimension size is n * m so kThe co-occurrence matrix N of speech and document just is: N=(c (w r, d s)) Rs, wherein, c (w r, d s) r number of times that the characteristic speech occurs in s document of expression;
Step2, in the set of this training dataset, according to top-down mode, successively train corresponding probability latent semantic analysis model, process is following:
Step2.1, utilize the TFIDF algorithm that the element of matrix N is carried out the calculating of weight, generate a new co-occurrence matrix
Figure 201110312540X100002DEST_PATH_FDA0000118310880000038
;
Step2.2, utilize probability latent semantic analysis algorithm to co-occurrence matrix
Figure 82100DEST_PATH_FDA0000118310880000038
Learn, generating size is the WZ=(p (w of n * Q r, z q)) RqAnd size is Q * m kDZ=(p (z q, d s)) QsTwo matrixes, wherein z q∈ Z=(z 1, z 2..., z Q), Z representes latent semantic space, Q representes the size of latent semantic space; P (w r, z q) represent that r characteristic speech is at potential semantic z qOn probability size; P (z q, d s) represent that s document is at potential semantic z qOn probability size;
Step3, utilize multi-class support vector machine SVM (Support Vector Machine) sorter respectively the corresponding DZ of the probability latent semantic analysis model of each layer training gained to be trained, what generate each layer correspondence has a supervision probability latent semantic analysis category of model device
Figure BDA0000099002030000051
When level=0, sorter is M 0
In the said method, the process that the step of the first step (5) utilization has supervision layering probability latent semantic analysis model to talk about the classification of wheel level subject categories is:
Step1: calculate current words wheel T iLanguage feature assemble vector The WZ that utilization has the study of supervision layering probability latent semantic analysis algorithm to obtain will Be mapped on the latent semantic space Z, just utilize latent semantic space Z to represent T iThe language feature content of assembling, that is:
Figure BDA0000099002030000054
Figure BDA0000099002030000055
Represent current words wheel T iLanguage feature accumulate in the probability distribution on the latent semantic space Z, that is to say characteristic term vector W is carried out the characteristic dimensionality reduction;
Step2: that utilizes that training obtains has a supervision layering probability latent semantic analysis category of model device M 0To T iCarry out the subject categories classification;
Step3: if T iSubject categories belong to mon_topics, level increases by 1 so, forwards Step4 to; Otherwise, mark T iSubject categories be the subject categories of being discerned, finish;
Step4: utilize corresponding
Figure BDA0000099002030000056
To T iCarry out the subject categories classification, forward Step3 to.
The detailed process of step (1) is following in described second step:
Step1: search and obtain [T i.stamp-Th, T i.stamp] generation and words wheel set that be not the incident end in the time interval
Figure BDA0000099002030000057
Step2: if
Figure BDA0000099002030000058
Only contain element T i, mark T so iBe the initial sentence of a new incident, algorithm finishes; Otherwise, make l=i-1, carry out Step3;
Step3: judge T iWith T lSubject categories whether identical;
Step4: if T iWith T lSubject categories identical, so with T iBelong to T lIn the affiliated incident, algorithm finishes; Otherwise make l=l-1, carry out Step5;
Step5: if l >=g so, forwards Step3 to; Otherwise, forward Step6 to;
Step6: if T iAffiliated incident be empty, make l '=i-1 so, forward Step7 to; Otherwise, finish algorithm;
Step7: calculate T i.id with T L '.id the tightness d on the community network level;
Step8: if d>0.5, so with T iBelong to T L 'In the affiliated incident, algorithm finishes; Otherwise make l '=l '-1, carry out Step9;
Step9: if l '>=g so, forwards Step7 to; Otherwise, mark T iBe the initial sentence of a new events, finish algorithm.
The computing method of described community network tightness are:
d ( T i . id , T i - 1 . id ) = IO ( T i . id , T i - 1 . id ) I ( T i . id ) + O ( T i . id ) + I ( T i - 1 . id ) + O ( T i - 1 . id )
I (T wherein i.id) expression T i.id in-degree sum, O (T i.id) expression T i.id out-degree sum, T I-1.id similar; IO (T i.id, T I-1.id) expression T i.id to T I-1.id number of times and T talk I-1.id to T iThe number of times sum of .id speaking, the statistics of out-degree, in-degree is the summation of historical data, the tightness of community network was upgraded once in every month.
Description of drawings
Fig. 1 event recognition of the present invention and trace flow figure.
Fig. 2 incremental training process flow diagram flow chart.
Fig. 3 talks about wheel subject categories classification process figure.
Fig. 4 talks about wheel level event recognition and following principle figure.
The example that Fig. 5 community network tightness is calculated.Wherein Fig. 5 a is a raw-data map, and Fig. 5 b is the digraph after transforming.
Embodiment
Understand the present invention for clearer, the present invention is made further detailed description below in conjunction with accompanying drawing.
1, the present invention adopts is that the identification of advanced jargon wheel subject categories is talked about wheel level event recognition again and come the words that user in the real-time interaction text imports are carried out event recognition and tracking with the mechanism of tracking, and its process flow diagram is as shown in Figure 1.
Research purpose: the Turn of user's input is belonged in the events corresponding.
The research background: the real-time interaction text is than single piece of documents such as blog, comment, novels, and it also has own unique language characteristic except that having inherited characteristics such as ambiguousness that natural language text has and non-standard property:
(1) interactivity;
(2) time series characteristic, great majority are close to real-time, interactive, and the theme between the words wheel has time dependence, and promptly with the near more talk of theme time of occurrence, the possibility that both are correlated with is bigger;
(3) content of each words wheel is few, and sentence is short, will inevitably cause characteristic sparse like this;
(4) interactive mode is complicated, for example one to one, the interactive mode of one-to-many, multi-to-multi;
(5) language performance is various informative in the mutual text, and language performance is succinct, and misspelling, term lack of standardization and noise are a lot.
These have brought bigger challenge all for the treatment technology of mutual text.
To the kind specific character that mutual text self has, the strategy of a kind of " two steps were walked " has been proposed.Concrete working mechanism is following:
The first step is carried out the classification of subject categories, in this step, identifies the affiliated subject categories of each words wheel, as: politics, economy, culture, education, science and technology etc., process is following:
(1) in the real-time interaction text, the speech Speech that once imports with the user is a words wheel Turn, is expressed as with five-tuple:
T i=(i,id,role,stamp,content)
Wherein, T iRepresent i words wheel, and i ∈ Z, Z is the positive integer set; Id representes to distinguish unique indications of speaker; Role representes speaker's role, and it divides two classifications: speaker Speaker and recipient recipient; Stamp representes to talk about the timestamp that wheel takes place; Content representes once to talk about all texts of making a speech in the wheel;
T so i.stamp just represent the time that i words wheel takes place, T i.content the content of just representing i words wheel, described mutual text are to come from same chatroom or the words wheel in the group is discussed;
(2) to current words wheel T iContent T i.content carry out the text pre-service, extract characteristic speech wherein, computational language proper vector according to feature lexicon
Figure BDA0000099002030000071
W wherein Ih, 0<h≤n representes that h characteristic speech is at T i.content the number of times that occurs in, the number of n representation feature speech; Described feature lexicon extracts from training data;
(3) if words wheel T iBeing the wheel of words first that occurs in the system, also is T 1, forward (5) to; Otherwise, carry out (4);
(4) calculate words wheel T iThe self-adaptation language feature assemble vector
Figure BDA0000099002030000072
wherein
Figure BDA0000099002030000073
0<h '≤n representes the number of times that the individual characteristic speech of h ' occurs, the number of n representation feature speech in this language feature is assembled;
(5) utilization has supervision layering probability latent semantic analysis model to talk about the classification of wheel level subject categories;
Second step, words wheel level event recognition and tracking phase:
(1) according to subject categories under the words wheel, mistiming that front and back words wheel takes place and the front and back tightness of words wheel speaker on the community network level are judged current words wheel T iWhether be beginning, continuity and the end of incident;
(2) if words wheel T iBe the incident end statement, just formed a complete incident, so mark T iBe the words wheel of End Event, otherwise be labeled as the not words wheel of End Event;
(3) judge whether to arrive the regular update time; If arrive, then to there being supervision layering probability latent semantic analysis model to carry out model modification; Otherwise, finishing algorithm, described regular update is meant that the complete event that will newly discern each the end of month joins in the training set, trains model, referring to accompanying drawing 2 again.
2, the subject categories classification mechanism of words wheel
The research purpose: those words wheels that will be relevant with current words wheel flock together, and utilize the language feature of words wheel to assemble the language feature vector of vector as current words wheel.
The research background: the content of each words wheel is few in the mutual text, and sentence is short, will inevitably cause characteristic sparse like this, assembles vector through the language feature that calculates current words wheel so, and it is few to overcome content to a certain extent, short these shortcomings of sentence.
Subject categories classification the present invention of dialogue wheel has adopted a kind of adaptive subject categories classification mechanism, and its process flow diagram is as shown in Figure 3, and concrete working mechanism is following:
(1) calculates current words wheel T iLanguage feature assemble vector, process is following
Step1: calculate current words wheel T iAfter the generation, at the time interval [T i.stamp-Δ T, T i.stamp] the frequency V (T that the words wheel takes place in i):
V ( T i ) = C ( T 1 . stamp , T i . stamp ) &Delta;T , T i . stamp - T 1 . stamp < &Delta;T C ( T i . stamp - &Delta;T , T i . stamp ) &Delta;T , T i . stamp - T 1 . stamp &GreaterEqual; &Delta;T
Wherein, C (T 1.stamp, T i.stamp) be illustrated in the time interval [T 1.stamp, T i.stamp] the words wheel number of times that takes place altogether in, C (T i.stamp-Δ T, T i.stamp) be illustrated in the time interval [T i.stamp-Δ T, T i.stamp] the words wheel number of times that takes place altogether in, Δ T be a regular time at interval, initialization Δ T=1 hour;
Step2: the size of the adaptive density threshold Th that confirms to be pressed for time: calculate Th ' earlier, that is:
Th &prime; = ( 1 - &Delta;v ) &times; Th , V ( T i ) &GreaterEqual; ( 1 + &Delta;v ) &times; V ( T i - 1 ) Th , ( 1 - &Delta;v ) &times; V ( T i - 1 ) < V ( T i ) < ( 1 + &Delta;v ) &times; V ( T i - 1 ) ( 1 + &Delta;v ) &times; Th , V ( T i ) &le; ( 1 - &Delta;v ) &times; V ( T i - 1 )
Th=Th ' then, promptly update time threshold value, wherein during initialization, Δ v is set to 0.3, threshold value Th=6 hour, utilizes above thought to reach the purpose of adaptive change time threshold Th size;
Step3: order
Figure BDA0000099002030000091
Express time is [T at interval i.stamp-Th, T i.stamp] the words wheel set that takes place in, T so iLanguage feature assemble vector and just do
Figure BDA0000099002030000092
In the language feature vector sum of all words wheels, that is:
Figure BDA0000099002030000093
(2) training has supervision layering probability latent semantic analysis model
Research purpose: training data is carried out laminated tissue and training according to the layering theme.
The research background: there is hierarchical nature in theme, and these exist in real world applications in a large number, classifies with the subject of the Ministry of Education during for example book classification is learned., near fine-grained core event training data is organized and trained according to this model from abstract theme, can effectively solve the nonequilibrium behavior of data.
Detailed process is following
Step1, according to the hierarchical nature of theme to training data set carrying out hierarchical classification tissue, what form after the theme layering is a tree structure, the note work:
M k level = { M 1 level + 1 , M 2 level + 1 , . . . , M a k level + 1 | a k &Element; Z }
Wherein level representes the residing level of subject categories, and k representes current theme
Figure BDA0000099002030000095
Be the k sub-topic classification that belongs in the last layer theme, a kRepresent current theme If the sub-topics number that is comprised is a k=0, so
Figure BDA0000099002030000097
Be exactly the leafy node of theme, be designated as
Figure BDA0000099002030000098
Otherwise
Figure BDA0000099002030000099
The mother node that comprises sub-topics exactly, note is done
Figure BDA00000990020300000910
Described mon_topics is meant the node set that includes sub-topics, and leaf_topics is meant the leafy node set; When level=0, note
Figure BDA00000990020300000911
The subject categories of expression top layer, wherein a 0Expression top layer subject categories number;
So, the organizational process of training data is following:
Step1.1, generating feature term vector W, process is following:
The sum of the independent speech that occurs in Step1.1.1, the set of statistics training data; Form wherein f number of times that the characteristic speech occurs of
Figure BDA00000990020300000913
expression of a characteristic term vector
Figure BDA00000990020300000912
after the deletion stop words in training data, the number of
Figure BDA00000990020300000914
representation feature speech; Described stop words comprises: symbol, auxiliary word, preposition, conjunction, interjection, onomatopoeia, number;
Step1.1.2, utilize the TFIDF algorithm right
Figure BDA00000990020300000915
Carry out the calculating of characteristic speech weight, and sort by weight is descending, it is W={w that the deletion weight obtains the characteristic term vector after less than 0.1 characteristic speech 1, w 2... w F '..., w n, w wherein F 'Represent the individual characteristic speech of f ' weight size in training data, the number of n representation feature speech;
Step1.2, generation co-occurrence matrix N, process is following:
Step1.2.1, with all belong to theme in the training data Document form a collection of document
Figure BDA0000099002030000102
M wherein kExpression document number;
Step1.2.2, the dimension size is n * m so kThe co-occurrence matrix N of speech and document just is: N=(c (w r, d s)) Rs, wherein, c (w r, d s) r number of times that the characteristic speech occurs in s document of expression;
Step2, in the set of this training dataset, according to top-down mode, successively train corresponding probability latent semantic analysis model, process is following:
Probability latent semantic analysis model is a kind of topic model, and its principle is following:
Make D={d 1, d 2..., d mThe expression collection of document, W={w 1, w 2..., w nThe representation feature set of words, wherein, m representes the document number, n representation feature speech number is ignored the order that speech takes place in document, so can generate the co-occurrence matrix N=(c (w of a m * n rd s) Rs, c (w here r, d s) r number of times that the characteristic speech occurs in s document of expression; Definition joint density model is:
p(d,w)=p(d)p(w|d), p ( w | d ) = &Sigma; z &Element; Z p ( w | z ) p ( z | d )
Z ∈ Z=(z wherein 1, z 2..., z Q) be latent semantic space, Q is the size of latent semantic space;
Being interpreted as of model so:
The probability that p (d) expression document occurs in data centralization; P (w|z) expression is when having confirmed semantic z; The probability that relevant speech w occurs is respectively much, and semantic distribution situation in document of p (z|d) expression is utilized above these definition; Just can form a generation model, utilize it to produce new data:
(1) at first selects a document d according to distribution p (d) random sampling;
(2) behind the selected document, the semantic z that sampling selects document to express according to p (z|d);
(3) behind the selected semanteme, select the speech of document according to p (w|z);
Process according to above theoretical training probability latent semantic analysis model topic model is:
Step2.1, utilize the TFIDF algorithm that the element of matrix N is carried out the calculating of weight, generate a new co-occurrence matrix ;
Step2.2, utilize probability latent semantic analysis algorithm to co-occurrence matrix
Figure 817898DEST_PATH_FDA0000118310880000038
Learn, generating size is the WZ=(p (w of n * Q r, z q)) RqAnd size is Q * m kDZ=(p (z q, d s)) QsTwo matrixes, wherein z q∈ Z=(z 1, z 2..., z Q), Z representes latent semantic space, Q representes the size of latent semantic space; P (w r, z q) represent that r characteristic speech is at potential semantic z qOn probability size; P (z q, d s) represent that s document is at potential semantic z qOn probability size;
Step3, utilize multi-class support vector machine SVM (Support Vector Machine) sorter respectively the corresponding DZ of the probability latent semantic analysis model of each layer training gained to be trained, what generate each layer correspondence has a supervision probability latent semantic analysis category of model device
Figure BDA0000099002030000111
When level=0, sorter is M 0What adopt in the experiment is the LIBSVM multicategory classification device that professor Lin Zhiren of Taiwan Univ. writes.
(3) hierarchical classification mechanism
The research purpose: utilizing has supervision layering probability latent semantic analysis model to talk with wheel to classify
The research background: the dialogue wheel carries out hierarchical classification, proceeds classification so if the classification of words wheel belongs to the mother node theme; Otherwise stop the subject categories of mark words wheel.The hierarchical classification process is following:
Step1: calculate current words wheel T iLanguage feature assemble vector
Figure BDA0000099002030000112
The WZ that utilization has the study of supervision layering probability latent semantic analysis algorithm to obtain will
Figure BDA0000099002030000113
Be mapped on the latent semantic space Z, just utilize latent semantic space Z to represent T iThe language feature content of assembling, that is:
Figure BDA0000099002030000115
Represent current words wheel T iLanguage feature accumulate in the probability distribution on the latent semantic space Z, that is to say characteristic term vector W is carried out the characteristic dimensionality reduction;
Step2: that utilizes that training obtains has a supervision layering probability latent semantic analysis category of model device M 0To T iCarry out the subject categories classification;
Step3: if T iSubject categories belong to mon_topics, level increases by 1 so, forwards Step4 to; Otherwise, mark T iSubject categories be the subject categories of being discerned, finish;
Step4: utilize corresponding
Figure BDA0000099002030000116
To T iCarry out the subject categories classification, forward Step3 to.
3, the identification and the splicing mechanism of words wheel level incident
Research purpose: will talk about wheel and belong in the events corresponding.
Research background: according to subject categories under the words wheel; Recognition and tracking the present invention of concrete incident adopted a kind ofly combined to talk about mistiming and the front and back tightness of words wheel speaker on the community network level that subject categories under the wheel, front and back words wheel take place and come beginning, the continuity of decision event and finish machine-processed; Its principle is as shown in Figure 4, and concrete working mechanism is following:
Step1: search and obtain [T i.stamp-Th, T i.stamp] generation and words wheel set that be not the incident end in the time interval
Step2: if
Figure BDA0000099002030000118
Only contain element T i, mark T so iBe the initial sentence of a new incident, algorithm finishes; Otherwise, make l=i-1, carry out Step3;
Step3: judge T iWith T lSubject categories whether identical;
Step4: if T iWith T lSubject categories identical, so with T iBelong to T lIn the affiliated incident, algorithm finishes; Otherwise make l=l-1, carry out Step5;
Step5: if l >=g so, forwards Step3 to; Otherwise, forward Step6 to;
Step6: if T iAffiliated incident be empty, make l '=i-1 so, forward Step7 to; Otherwise, finish algorithm;
Step7: calculate T i.id with T L '.id the tightness d on the community network level;
Step8: if d>0.5, so with T iBelong to T L 'In the affiliated incident, algorithm finishes; Otherwise make l '=l '-1, carry out Step9;
Step9: if l '>=g so, forwards Step7 to; Otherwise, mark T iBe the initial sentence of a new events, finish algorithm.
The computing method of described community network tightness are:
d ( T i . id , T i - 1 . id ) = IO ( T i . id , T i - 1 . id ) I ( T i . id ) + O ( T i . id ) + I ( T i - 1 . id ) + O ( T i - 1 . id )
I (T wherein i.id) expression T i.id in-degree sum, O (T i.id) expression T i.id out-degree sum, T I-1.id similar; IO (T i.id, T I-1.id) expression T i.id to T I-1.id number of times and T talk I-1.id to T iThe number of times sum of .id speaking, the statistics of out-degree, in-degree is the summation of historical data, the tightness of community network was upgraded once in every month.
For the ease of better understanding the computing method of community network tightness, give one example here and set forth, referring to accompanying drawing 5, raw data is converted into digraph, the tightness of A that calculates so and the community network of B is:
d ( A , B ) = 5 5 + 5 + 3 + 4 = 0.294

Claims (4)

1. event recognition and tracking towards a real-time interaction text is characterized in that: comprise the steps:
The first step: words wheel level subject categories sorting phase:
(1) in the real-time interaction text, the speech Speech that once imports with the user is a words wheel Turn, is expressed as with five-tuple:
T i=(i,id,role,stamp,content)
Wherein, T iRepresent i words wheel, and i ∈ Z, Z is the positive integer set; Id representes to distinguish unique indications of speaker; Role representes speaker's role, and it divides two classifications: speaker Speaker and recipient recipient; Stamp representes to talk about the timestamp that wheel takes place; Content representes once to talk about all texts of making a speech in the wheel;
T so i.stamp just represent the time that i words wheel takes place, T i.content the content of just representing i words wheel, described mutual text are to come from same chatroom or the words wheel in the group is discussed;
(2) to current words wheel T iContent T i.content carry out the text pre-service, extract characteristic speech wherein, computational language proper vector according to feature lexicon
Figure FDA0000099002020000011
W wherein Ih, 0<h≤n representes that h characteristic speech is at T i.content the number of times that occurs in, the number of n representation feature speech; Described feature lexicon extracts from training data;
(3) if words wheel T iBeing the wheel of words first that occurs in the system, also is T 1, forward (5) to; Otherwise, carry out (4);
(4) calculate words wheel T iThe self-adaptation language feature assemble vector
Figure FDA0000099002020000012
wherein
Figure FDA0000099002020000013
0<h '≤n representes the number of times that the individual characteristic speech of h ' occurs, the number of n representation feature speech in this language feature is assembled;
(5) utilization has supervision layering probability latent semantic analysis model to talk about the classification of wheel level subject categories;
Second step, words wheel level event recognition and tracking phase:
(1) according to subject categories under the words wheel, mistiming that front and back words wheel takes place and the front and back tightness of words wheel speaker on the community network level are judged current words wheel T iWhether be beginning, continuity and the end of incident;
(2) if words wheel T iBe the incident end statement, just formed a complete incident, so mark T iBe the words wheel of End Event, otherwise be labeled as the not words wheel of End Event;
(3) judge whether to arrive the regular update time; If arrive, then to there being supervision layering probability latent semantic analysis model to carry out model modification; Otherwise, finishing algorithm, described regular update is meant that the complete event that will newly discern each the end of month joins in the training set, trains model again;
The computation process that the described self-adaptation language feature of the step of the first step (4) is assembled vector is:
Step1: calculate current words wheel T iAfter the generation, at the time interval [T i.stamp-Δ T, T i.stamp] the frequency V (T that the words wheel takes place in i):
Figure FDA0000099002020000021
Wherein, C (T 1.stamp, T i.stamp) be illustrated in the time interval [T 1.stamp, T i.stamp] the words wheel number of times that takes place altogether in, C (T i.stamp-Δ T, T i.stamp) be illustrated in the time interval [T i.stamp-Δ T, T i.stamp] the words wheel number of times that takes place altogether in, Δ T be a regular time at interval, initialization Δ T=1 hour;
Step2: the size of the adaptive density threshold Th that confirms to be pressed for time: calculate Th ' earlier, that is:
Figure FDA0000099002020000022
Th=Th ' then, promptly update time threshold value, wherein during initialization, Δ v is set to 0.3, threshold value Th=6 hour, utilizes above thought to reach the purpose of adaptive change time threshold Th size;
Step3: order
Figure FDA0000099002020000023
Express time is [T at interval i.stamp-Th, T i.stamp] the words wheel set that takes place in, T so iLanguage feature assemble vector and just do In the language feature vector sum of all words wheels, that is:
Figure FDA0000099002020000025
The step (3) in the step of the first step (5), second step is described to have supervision layering probability latent semantic analysis model, and its training process is following:
Step1, according to the hierarchical nature of theme to training data set carrying out hierarchical classification tissue, what form after the theme layering is a tree structure, the note work:
Figure FDA0000099002020000026
Wherein level representes the residing level of subject categories, and k representes current theme
Figure FDA0000099002020000027
Be the k sub-topic classification that belongs in the last layer theme, a kRepresent current theme If the sub-topics number that is comprised is a k=0, so
Figure FDA0000099002020000029
Be exactly the leafy node of theme, be designated as Otherwise
Figure FDA00000990020200000211
The mother node that comprises sub-topics exactly, note is done
Figure FDA00000990020200000212
Described mon_topics is meant the node set that includes sub-topics, and leaf_topics is meant the leafy node set; When level=0, note
Figure FDA00000990020200000213
The subject categories of expression top layer, wherein a 0Expression top layer subject categories number;
So, the organizational process of training data is following:
Step1.1, generating feature term vector W, process is following:
The sum of the independent speech that occurs in Step1.1.1, the set of statistics training data; Form wherein f number of times that the characteristic speech occurs of expression of a characteristic term vector
Figure FDA0000099002020000031
after the deletion stop words in training data, the number of
Figure FDA0000099002020000033
representation feature speech; Described stop words comprises: symbol, auxiliary word, preposition, conjunction, interjection, onomatopoeia, number;
Step1.1.2, utilize the TFIDF algorithm right
Figure FDA0000099002020000034
Carry out the calculating of characteristic speech weight, and sort by weight is descending, it is W={w that the deletion weight obtains the characteristic term vector after less than 0.1 characteristic speech 1, w 2... w F '..., w n, w wherein F 'Represent the individual characteristic speech of f ' weight size in training data, the number of n representation feature speech;
Step1.2, generation co-occurrence matrix N, process is following:
Step1.2.1, with all belong to theme in the training data
Figure FDA0000099002020000035
Document form a collection of document
Figure FDA0000099002020000036
M wherein kExpression document number;
Step1.2.2, the dimension size is n * m so kThe co-occurrence matrix N of speech and document just is: N=(c (w r, d s)) Rs, wherein, c (w r, d s) r number of times that the characteristic speech occurs in s document of expression;
Step2, in the set of this training dataset, according to top-down mode, successively train corresponding probability latent semantic analysis model, process is following:
Step2.1, utilize the TFIDF algorithm that the element of matrix N is carried out the calculating of weight, generate a new co-occurrence matrix
Figure DEST_PATH_FDA0000118310880000038
;
Step2.2, utilize probability latent semantic analysis algorithm to co-occurrence matrix
Figure 437878DEST_PATH_FDA0000118310880000038
Learn, generating size is the WZ=(p (w of n * Q r, z q)) RqAnd size is Q * m kDZ=(p (z q, d s)) QsTwo matrixes, wherein z q∈ Z=(z 1, z 2..., z Q), Z representes latent semantic space, Q representes the size of latent semantic space; P (w r, z q) represent that r characteristic speech is at potential semantic z qOn probability size; P (z q, d s) represent that s document is at potential semantic z qOn probability size;
Step3, utilize multi-class support vector machine SVM (Support Vector Machine) sorter respectively the corresponding DZ of the probability latent semantic analysis model of each layer training gained to be trained, what generate each layer correspondence has a supervision probability latent semantic analysis category of model device
Figure FDA0000099002020000039
When level=0, sorter is M 0
2. a kind of event recognition and tracking towards the real-time interaction text as claimed in claim 1 is characterized in that: the process that step in the first step (5) utilization has supervision layering probability latent semantic analysis model to talk about the classification of wheel level subject categories is:
Step1: calculate current words wheel T iLanguage feature assemble vector
Figure FDA0000099002020000041
The WZ that utilization has the study of supervision layering probability latent semantic analysis algorithm to obtain will Be mapped on the latent semantic space Z, just utilize latent semantic space Z to represent T iThe language feature content of assembling, that is:
Figure FDA0000099002020000044
Represent current words wheel T iLanguage feature accumulate in the probability distribution on the latent semantic space Z, that is to say characteristic term vector W is carried out the characteristic dimensionality reduction;
Step2: that utilizes that training obtains has a supervision layering probability latent semantic analysis category of model device M 0To T iCarry out the subject categories classification;
Step3: if T iSubject categories belong to mon_topics, level increases by 1 so, forwards Step4 to; Otherwise, mark T iSubject categories be the subject categories of being discerned, finish;
Step4: utilize corresponding
Figure FDA0000099002020000045
To T iCarry out the subject categories classification, forward Step3 to.
3. a kind of event recognition and tracking towards the real-time interaction text as claimed in claim 1 is characterized in that: the detailed process of step (1) is following in described second step:
Step1: search and obtain [T i.stamp-Th, T i.stamp] generation and words wheel set that be not the incident end in the time interval
Figure FDA0000099002020000046
Step2: if Only contain element T i, mark T so iBe the initial sentence of a new incident, algorithm finishes; Otherwise, make l=i-1, carry out Step3;
Step3: judge T iWith T lSubject categories whether identical;
Step4: if T iWith T lSubject categories identical, so with T iBelong to T lIn the affiliated incident, algorithm finishes; Otherwise make l=l-1, carry out Step5;
Step5: if l >=g so, forwards Step3 to; Otherwise, forward Step6 to;
Step6: if T iAffiliated incident be empty, make l '=i-1 so, forward Step7 to; Otherwise, finish algorithm;
Step7: calculate T i.id with T lThe tightness d of ' .id on the community network level;
Step8: if d>0.5, so with T iBelong to T L 'In the affiliated incident, algorithm finishes; Otherwise make l '=l '-1, carry out Step9;
Step9: if l '>=g so, forwards Step7 to; Otherwise, mark T iBe the initial sentence of a new events, finish algorithm.
4. a kind of event recognition and tracking as claimed in claim 3 towards the real-time interaction text, it is characterized in that: the computing method of described community network tightness are:
Figure FDA0000099002020000051
I (T wherein i.id) expression T i.id in-degree sum, O (T i.id) expression T i.id out-degree sum, T I-1.id similar; IO (T i.id, T I-1.id) expression T i.id to T I-1.id number of times and T talk I-1.id to T iThe number of times sum of .id speaking, the statistics of out-degree, in-degree is the summation of historical data, the tightness of community network was upgraded once in every month.
CN 201110312540 2011-10-15 2011-10-15 Instant interactive text oriented event identifying and tracking method Expired - Fee Related CN102411611B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN 201110312540 CN102411611B (en) 2011-10-15 2011-10-15 Instant interactive text oriented event identifying and tracking method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN 201110312540 CN102411611B (en) 2011-10-15 2011-10-15 Instant interactive text oriented event identifying and tracking method

Publications (2)

Publication Number Publication Date
CN102411611A true CN102411611A (en) 2012-04-11
CN102411611B CN102411611B (en) 2013-01-02

Family

ID=45913682

Family Applications (1)

Application Number Title Priority Date Filing Date
CN 201110312540 Expired - Fee Related CN102411611B (en) 2011-10-15 2011-10-15 Instant interactive text oriented event identifying and tracking method

Country Status (1)

Country Link
CN (1) CN102411611B (en)

Cited By (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104156228A (en) * 2014-04-01 2014-11-19 兰州工业学院 Client-side short message filtration embedded feature library generating and updating method
CN104881399A (en) * 2015-05-15 2015-09-02 中国科学院自动化研究所 Event identification method and system based on probability soft logic PSL
CN106021508A (en) * 2016-05-23 2016-10-12 武汉大学 Sudden event emergency information mining method based on social media
CN106663426A (en) * 2014-07-03 2017-05-10 微软技术许可有限责任公司 Generating computer responses to social conversational inputs
CN106844765A (en) * 2017-02-22 2017-06-13 中国科学院自动化研究所 Notable information detecting method and device based on convolutional neural networks
CN107145516A (en) * 2017-04-07 2017-09-08 北京捷通华声科技股份有限公司 A kind of Text Clustering Method and system
CN107862081A (en) * 2017-11-29 2018-03-30 四川无声信息技术有限公司 Network Information Sources lookup method, device and server
CN108427752A (en) * 2018-03-13 2018-08-21 浙江大学城市学院 A kind of article meaning of one's words mask method using event based on isomery article
CN110246049A (en) * 2018-03-09 2019-09-17 北大方正集团有限公司 Topic detecting method, device, equipment and readable storage medium storing program for executing
US10909969B2 (en) 2015-01-03 2021-02-02 Microsoft Technology Licensing, Llc Generation of language understanding systems and methods
CN113626573A (en) * 2021-08-11 2021-11-09 北京深维智信科技有限公司 Sales session objection and response extraction method and system

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6424971B1 (en) * 1999-10-29 2002-07-23 International Business Machines Corporation System and method for interactive classification and analysis of data
CN1403959A (en) * 2001-09-07 2003-03-19 联想(北京)有限公司 Content filter based on text content characteristic similarity and theme correlation degree comparison
CN1535433A (en) * 2001-07-04 2004-10-06 库吉萨姆媒介公司 Category based, extensible and interactive system for document retrieval
JP2009146397A (en) * 2007-11-19 2009-07-02 Omron Corp Important sentence extraction method, important sentence extraction device, important sentence extraction program and recording medium

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6424971B1 (en) * 1999-10-29 2002-07-23 International Business Machines Corporation System and method for interactive classification and analysis of data
CN1535433A (en) * 2001-07-04 2004-10-06 库吉萨姆媒介公司 Category based, extensible and interactive system for document retrieval
CN1403959A (en) * 2001-09-07 2003-03-19 联想(北京)有限公司 Content filter based on text content characteristic similarity and theme correlation degree comparison
JP2009146397A (en) * 2007-11-19 2009-07-02 Omron Corp Important sentence extraction method, important sentence extraction device, important sentence extraction program and recording medium

Cited By (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104156228A (en) * 2014-04-01 2014-11-19 兰州工业学院 Client-side short message filtration embedded feature library generating and updating method
CN104156228B (en) * 2014-04-01 2017-11-10 兰州工业学院 A kind of embedded feature database of client filtering short message and update method
CN106663426A (en) * 2014-07-03 2017-05-10 微软技术许可有限责任公司 Generating computer responses to social conversational inputs
US10909969B2 (en) 2015-01-03 2021-02-02 Microsoft Technology Licensing, Llc Generation of language understanding systems and methods
CN104881399B (en) * 2015-05-15 2017-10-27 中国科学院自动化研究所 Event recognition method and system based on probability soft logic PSL
CN104881399A (en) * 2015-05-15 2015-09-02 中国科学院自动化研究所 Event identification method and system based on probability soft logic PSL
CN106021508A (en) * 2016-05-23 2016-10-12 武汉大学 Sudden event emergency information mining method based on social media
CN106844765A (en) * 2017-02-22 2017-06-13 中国科学院自动化研究所 Notable information detecting method and device based on convolutional neural networks
CN106844765B (en) * 2017-02-22 2019-12-20 中国科学院自动化研究所 Significant information detection method and device based on convolutional neural network
CN107145516A (en) * 2017-04-07 2017-09-08 北京捷通华声科技股份有限公司 A kind of Text Clustering Method and system
CN107145516B (en) * 2017-04-07 2021-03-19 北京捷通华声科技股份有限公司 Text clustering method and system
CN107862081A (en) * 2017-11-29 2018-03-30 四川无声信息技术有限公司 Network Information Sources lookup method, device and server
CN110246049A (en) * 2018-03-09 2019-09-17 北大方正集团有限公司 Topic detecting method, device, equipment and readable storage medium storing program for executing
CN108427752A (en) * 2018-03-13 2018-08-21 浙江大学城市学院 A kind of article meaning of one's words mask method using event based on isomery article
CN113626573A (en) * 2021-08-11 2021-11-09 北京深维智信科技有限公司 Sales session objection and response extraction method and system

Also Published As

Publication number Publication date
CN102411611B (en) 2013-01-02

Similar Documents

Publication Publication Date Title
CN102411611B (en) Instant interactive text oriented event identifying and tracking method
Aguilar et al. A multi-task approach for named entity recognition in social media data
CN110866117B (en) Short text classification method based on semantic enhancement and multi-level label embedding
CN105677873B (en) Text Intelligence association cluster based on model of the domain knowledge collects processing method
CN103984681B (en) News event evolution analysis method based on time sequence distribution information and topic model
CN104679738B (en) Internet hot words mining method and device
Huang et al. A topic BiLSTM model for sentiment classification
CN105760499A (en) Method for analyzing and predicting network public sentiment based on LDA topic model
Kandhro et al. Performance analysis of hyperparameters on a sentiment analysis model
CN104881399A (en) Event identification method and system based on probability soft logic PSL
Huang et al. Multi-granular document-level sentiment topic analysis for online reviews
Zheng et al. Chinese sentiment analysis of online education and internet buzzwords based on BERT
Du et al. A deceptive detection model based on topic, sentiment, and sentence structure information
CN116578705A (en) Microblog emotion classification method based on pre-training language model and integrated neural network
Guan et al. Hierarchical neural network for online news popularity prediction
Bölücü et al. Hate Speech and Offensive Content Identification with Graph Convolutional Networks.
Balouchzahi et al. LA-SACo: A study of learning approaches for sentiments analysis inCode-mixing texts
Lee et al. Sentiment analysis on online social network using probability Model
Pathuri et al. Feature based sentimental analysis for prediction of mobile reviews using hybrid bag-boost algorithm
Kaur et al. Sentiment analysis of twitter data using hybrid method of support vector machine and ant colony optimization
Jiang et al. A hierarchical bidirectional LSTM sequence model for extractive text summarization in electric power systems
Habbat et al. Using AraGPT and ensemble deep learning model for sentiment analysis on Arabic imbalanced dataset
Hu et al. A study on discovery method of hot topics based on smart campus big data platform
Fan et al. Multi-label Chinese question classification based on word2vec
CN113901172A (en) Case-related microblog evaluation object extraction method based on keyword structure codes

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20130102

Termination date: 20151015

EXPY Termination of patent right or utility model