CN103116573A - Field dictionary automatic extension method based on vocabulary annotation - Google Patents
Field dictionary automatic extension method based on vocabulary annotation Download PDFInfo
- Publication number
- CN103116573A CN103116573A CN2013100466473A CN201310046647A CN103116573A CN 103116573 A CN103116573 A CN 103116573A CN 2013100466473 A CN2013100466473 A CN 2013100466473A CN 201310046647 A CN201310046647 A CN 201310046647A CN 103116573 A CN103116573 A CN 103116573A
- Authority
- CN
- China
- Prior art keywords
- node
- field
- vocabulary
- dictionary
- language material
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Images
Abstract
The invention relates to a field dictionary automatic extension method based on vocabulary annotation and belongs to the technical field of natural language processing. The field dictionary automatic extension method based on the vocabulary annotation comprises the following steps: (1) growing a field classification tree through analyzing the relevancy of a field dictionary which belongs to fields; (2) obtaining a training set for each field dictionary to be extended; (3) processing pretreatment to the training set to obtain a linguistic data feature set; (4) counting times of each panel point, corresponding to each vocabulary in the linguistic data feature set, appearing in the linguistic data feature set and the number of the linguistic data feature set of one vocabulary contained by a secondary panel point, corresponding to the linguistic data feature set, of the panel point; (5) calculating the confidence coefficient of each vocabulary in each linguistic data feature set; (6) adding new vocabulary to the field dictionary to be extended. The field dictionary automatic extension method based on the vocabulary annotation has no need to collect a field corpus by workers, so that the influence of the quality of the field corpus, limit of the scale and unbalance of the field corpus can be avoided.
Description
Technical field
The present invention relates to the automatic extending method of a kind of field dictionary, particularly a kind of automatic extending method of field dictionary based on the vocabulary note belongs to the natural language processing technique field.
Background technology
Field dictionary (Domain Dictionary) refers to the set of the distinctive term of specific area or expression way.The field dictionary is the basic resources of natural language processing, domain knowledge is widely used in the links such as the word sense disambiguation, syntactic analysis of the multiple-tasks such as mechanical translation, information retrieval, data mining and text classification, and the scale of field dictionary and quality are directly connected to the performance of related application.
The structure of field dictionary and extending method can be divided three classes according to automaticity: based on the artificial constructed and extending method of expertise, and Semi-Automatic Generation and extending method and full-automatic the generation and extending method.Artificial constructed high with the extending method accuracy rate, but need a large amount of domain experts to participate in for a long time, and cost of labor and time cost are too high, and lack real-time.Full-automatic generation and extending method are judged the domain attribute of vocabulary by analyzing the difference of vocabulary statistical property in the different field corpus, and the method need not domain expert's participation, has saved a large amount of costs of labor, but the accuracy rate that dictionary is included is not high.Automanual generation and extending method are specified a small amount of domain knowledge by the domain expert between artificial writing and automatic Generation, realize the automatic expansion of field dictionary.Existing semi-automatic and full automatic field dictionary methods needs the support of domain corpus mostly, the quality of the field dictionary that generates depends on the quality of the domain corpus that adopts, the completeness of field dictionary is subject to the restriction of domain corpus scale, simultaneously, consider the impact of the non-equilibrium property of corpus, the field of word mark is easier of corpus sweeping field deflection.Above-mentioned two kinds of methods all fail to effectively utilize existing dictionary resources, and the correlativity between the consideration field not.
Summary of the invention
The objective of the invention is the deficiency for the automatic extending method existence of at present existing field dictionary, propose a kind of automatic extending method of field dictionary based on the vocabulary note.
The objective of the invention is to be achieved through the following technical solutions.
A kind of automatic extending method of field dictionary based on the vocabulary note, its concrete operation step is:
Step 1, by the degree of correlation between field under the analysis field dictionary, generates a domain classification and sets.Be specially:
Step 1.1: represent pending node set with symbol D, and the original state of setting pending node set is for empty;
Step 1.2: the field dictionary that each is to be expanded is put in pending node set as a node respectively.Nodename is the title of this field dictionary, and node content is the whole entries in this field dictionary; Described entry comprises the explain information of vocabulary and this vocabulary.
Step 1.3: calculate respectively the degree of correlation between the field under the field dictionary of any two the node representatives in pending node set by formula (1), with symbol R (d
1, d
2) expression.
Wherein, R (d
1, d
2) represent that in pending node set, a certain field dictionary (is used symbol D
1Expression) symbol d (is used in affiliated field
1The expression) and another field dictionary (use symbol D
2Expression) symbol d (is used in affiliated field
2Expression) the degree of correlation; | d
1∩ d
2| expression field dictionary D
1With field dictionary D
2The number of the identical vocabulary that comprises; Min (| d
1, d
2|) expression field dictionary D
1With field dictionary D
2The vocabulary number that the field dictionary of middle negligible amounts comprises.
Step 1.4: the degree of correlation R (d between the field dictionary of any two the node representatives in the pending node set that obtains from step 1.3
1, d
2) in find out maximal value, use symbol R
maxExpression; This maximal value R
maxTwo corresponding field dictionaries are used respectively symbol D
1' and D
2' expression, field dictionary D
1' and D
2' affiliated field use respectively symbol d
1' and d
2' expression, field dictionary D
1' and D
2' in content use respectively symbol c
1And c
2Expression.
Step 1.5: with field dictionary D
1' and D
2' in entry and also, and give and and after new title of dictionary definition, use D
newExpression; Should and and after dictionary D
newContent symbol c
newExpression, c
new=c
1∪ c
2Then set up a new node, the name of new node is called D
new, the content of new node is c
newField dictionary D
1' and D
2' as node D
newChild node.
Step 1.6: with new node D
newJoin in pending node set, and with node D
1' and D
2' delete from pending node set.
Step 1.7: add up the number of node in pending node set, N represents with symbol.If N 〉=2 turn back to step 1.3; Otherwise, end operation.
Through the operation of above-mentioned steps, namely obtain a domain classification tree.
Step 2, obtain a training set for each field dictionary to be expanded.
This step can with step 1 synchronous operation: determine one with the universal electric dictionary of note, then for the vocabulary in each field dictionary to be expanded, be done as follows respectively: search successively each vocabulary in this field dictionary from the universal electric dictionary with note, then the note that each vocabulary is corresponding is put into training set corresponding to this field as a training data, can obtain the training set in this field.
Through the operation of step 2, corresponding field dictionary to be expanded can obtain training set corresponding to the affiliated field of a field dictionary to be expanded.
Step 3, training set is carried out pre-service, obtain the language material feature set.
On the basis of step 2 operation, successively the corpus in the training set of each field dictionary to be expanded is carried out pre-service, obtain the language material feature set corresponding to training set in this field, be specially: every training data in the training set in some fields is carried out participle, phrase extraction, lemmatization and goes the pre-service such as stop words, obtain one group of vocabulary corresponding to this training data, be called the language material character subset.The set of the language material character subset that the whole training datas in the training set in this field are corresponding is called language material feature set corresponding to this field dictionary.
Step 4, on the basis of step 1 and step 3, the leaf node on the domain classification tree that obtains for step 1 is added up the number of times that in language material feature set corresponding to each leaf node, each vocabulary occurs in this language material feature set.For non-leaf node, at first the language material feature set of the child node of each non-leaf node is carried out and also, with the result that the merges language material feature set as this non-leaf node, then add up following data: the number of times that 1. in the language material feature set of this non-leaf node, each vocabulary occurs in the language material feature set of this non-leaf node; 2. for each vocabulary in the language material feature set of this non-leaf node, comprise the number of the language material feature set of this vocabulary in language material feature set corresponding to the child node of this non-leaf node.
Step 5, on the basis of step 4 operation, calculate the degree of confidence of each vocabulary in each language material feature set according to formula (2).
Wherein, wdc represents the degree of confidence of the some vocabulary (w represents with symbol) in language material feature set corresponding to a certain field (d represents with symbol); Wd represents the number of times that vocabulary w occurs in the d of field; Σ wd represents the total degree that occurs in language material feature set corresponding to the father node of corresponding node of language material feature set at vocabulary w place; Dt represents to comprise in language material feature set corresponding to the brotgher of node of corresponding node of language material feature set at vocabulary w place the number of the language material feature set of this vocabulary w.
Step 6, new term is joined in field dictionary to be expanded.
On the basis of step 5 operation, the vocabulary of newly including in the universal electric dictionary with note described in step 2 as new term, is added in field dictionary to be expanded, concrete operation step is:
Step 6.1: the note to new term carries out participle, phrase extraction, lemmatization and goes the pre-service such as stop words, obtains one group of vocabulary corresponding to this vocabulary note, represents the quantity of this group vocabulary with n.
Step 6.2: with the root node in the classification tree of field as present node.
Step 6.3: calculate successively degree of membership between field corresponding to each child node of the present node in new term and domain classification tree according to formula (3), and find out maximal value wherein, use symbol sdc
maxExpression.
Wherein, sdc
kDegree of membership during expression new term and domain classification are set between field (k represents with symbol) corresponding to each child node of present node; wdc
jkThe degree of confidence of j vocabulary and field k in one group of vocabulary corresponding to expression new term note; m
kRepresent in n vocabulary corresponding to new term note, in the highest number of degree of confidence of field k.
Step 6.4: if the maximal value sdc of the degree of membership that step 6.3 obtains
maxGreater than preassigned threshold value, further judge this maximal value sdc
maxWhether corresponding node is leaf node, if leaf node adds new term in field dictionary corresponding to this node; If not leaf node, with this maximal value sdc
maxThen corresponding node turns back to step 6.3 as present node.If the maximal value sdc of the degree of membership that step 6.3 obtains
maxBe not more than preassigned threshold value, with new term as popular word, do not add in any one field dictionary to be expanded end operation to.
Operation through above-mentioned steps can realize the automatic expansion to the field dictionary.
Beneficial effect
The present invention proposes to compare with the existing field automatic extending method of dictionary based on the automatic extending method of field dictionary of vocabulary note, therefore its advantage is not need manually to collect domain corpus, has avoided being subjected to the limitation of quality and scale of domain corpus and the impact of the non-equilibrium property of domain corpus.
Description of drawings
Fig. 1 is the domain classification tree in the specific embodiment of the invention.
Embodiment
Below in conjunction with the drawings and specific embodiments, the present invention is described in further detail.
Lexical information and the common factor between dictionary of communication in the mechanical dictionary of Huajian, aviation, machinery and four field dictionaries of computing machine are as shown in table 1.In table 1, comprise respectively 12626 vocabulary, 7592 vocabulary, 19250 vocabulary, 5156 vocabulary in the field dictionary in communication, aviation, machinery and computing machine four fields.The common factor quantity of communication and aviation field dictionary is 4432; The common factor quantity of communication and mechanical field dictionary is 6210; The common factor quantity of communication and computing machine is 2705; The common factor quantity of aviation and mechanical field dictionary is 4908; The common factor quantity of aviation and computer realm dictionary is 2064; The common factor quantity of machinery and computing machine is 2383.
The lexical information of four field dictionaries of table 1 and the common factor information slip between dictionary
? | Communication | Aviation | Machinery | Computing machine |
Communication | 12626 | 4432 | 6210 | 2705 |
Aviation | 4432 | 7592 | 4908 | 2064 |
Machinery | 6210 | 4908 | 19250 | 2383 |
Computing machine | 2705 | 2064 | 2383 | 5156 |
The automatic extending method of field dictionary based on the vocabulary note that uses the present invention to propose expands automatically to communication in the mechanical dictionary of Huajian, aviation, machinery and four field dictionaries of computing machine, and its concrete operation step is:
Step 1, by the degree of correlation between field under the analysis field dictionary, generates a domain classification and sets.Be specially:
Step 1.1: the original state of setting pending node set D is empty;
Step 1.2: " communication ", " aviation ", " machinery " and " computing machine " four field dictionaries are put in pending node set as a node respectively.Nodename is the title of this field dictionary, and node content is the whole entries in this field dictionary; Described entry comprises the explain information of vocabulary and this vocabulary.
Step 1.3: calculate respectively the degree of correlation R (d between the field under the field dictionary of any two the node representatives in pending node set by formula (1)
1, d
2).
Step 1.4: be aviation and machinery by calculating two fields that the degree of correlation is the highest as can be known.
Step 1.5: aviation and machinery are merged into a node, calculate new node " Hang Kong ﹠amp; Machinery " respectively with computing machine and the degree of correlation of communicating by letter
Step 1.6: with new node " Hang Kong ﹠amp; Machinery " join in pending node set, and " aviation " and " machinery " deleted from pending node set.
Step 1.7: in pending node set, the number of node is 3, and then repeating step 1.3 to 1.7.Until only have a node in pending node set, can obtain a domain classification tree, as shown in Figure 1.The root node Root of domain classification tree has two child nodes, is respectively " aviation; Machinery " with " ﹠amp communicates by letter; Computing machine "; Node " Hang Kong ﹠amp; Machinery " under two child nodes are arranged, be respectively " aviation " and " machinery "; Node " communication ﹠amp; Computing machine " under two child nodes are arranged, be respectively " communication " and " computing machine ".
Step 2, obtain a training set for each field dictionary to be expanded.
This step can with step 1 synchronous operation: determine one with the universal electric dictionary of note, then for the vocabulary in each field dictionary to be expanded, be done as follows respectively: search successively each vocabulary in this field dictionary from the universal electric dictionary with note, then the note that each vocabulary is corresponding is put into training set corresponding to this field as a training data, can obtain the training set in this field.
Through the operation of step 2, corresponding field dictionary to be expanded can obtain training set corresponding to the affiliated field of a field dictionary to be expanded.
Step 3, training set is carried out pre-service, obtain the language material feature set.
On the basis of step 2 operation, successively the corpus in the training set of each field dictionary to be expanded is carried out pre-service, obtain the language material feature set corresponding to training set in this field, be specially: every training data in the training set in some fields is carried out participle, phrase extraction, lemmatization and goes the pre-service such as stop words, obtain one group of vocabulary corresponding to this training data, be called the language material character subset.The set of the language material character subset that the whole training datas in the training set in this field are corresponding is called language material feature set corresponding to this field dictionary.
Step 4, on the basis of step 1 and step 3, the leaf node on the domain classification tree that obtains for step 1 is added up the number of times that in language material feature set corresponding to each leaf node, each vocabulary occurs in this language material feature set.For non-leaf node, at first the language material feature set of the child node of each non-leaf node is carried out and also, with the result that the merges language material feature set as this non-leaf node, then add up following data: the number of times that 1. in the language material feature set of this non-leaf node, each vocabulary occurs in the language material feature set of this non-leaf node; 2. for each vocabulary in the language material feature set of this non-leaf node, comprise the number of the language material feature set of this vocabulary in language material feature set corresponding to the child node of this non-leaf node.
Step 5, on the basis of step 4 operation, calculate the degree of confidence of each vocabulary in each language material feature set according to formula (2).
Step 6, new term is joined in field dictionary to be expanded.
On the basis of step 5 operation, the vocabulary of newly including in the universal electric dictionary with note described in step 2 as new term, is added in field dictionary to be expanded, concrete operation step is:
Step 6.1: the note to new term carries out participle, phrase extraction, lemmatization and goes the pre-service such as stop words, obtains one group of vocabulary corresponding to this vocabulary note, represents the quantity of this group vocabulary with n.
Step 6.2: with the root node in the classification tree of field as present node.
Step 6.3: calculate successively degree of membership between field corresponding to each child node of the present node in new term and domain classification tree according to formula (3), and find out maximal value sdc wherein
max
Step 6.4: if the maximal value sdc of the degree of membership that step 6.3 obtains
maxGreater than preassigned threshold value 0.7, further judge this maximal value sdc
maxWhether corresponding node is leaf node, if leaf node adds new term in field dictionary corresponding to this node; If not leaf node, with this maximal value sdc
maxThen corresponding node turns back to step 6.3 as present node.If the maximal value sdc of the degree of membership that step 6.3 obtains
maxBe not more than preassigned threshold value, with new term as popular word, do not add in any one field dictionary to be expanded end operation to.
Operation through above-mentioned steps can realize the automatic expansion to the field dictionary.
Claims (1)
1. automatic extending method of field dictionary based on the vocabulary note, it is characterized in that: its concrete operation step is:
Step 1, by the degree of correlation between field under the analysis field dictionary, generates a domain classification and sets; Be specially:
Step 1.1: represent pending node set with symbol D, and the original state of setting pending node set is for empty;
Step 1.2: the field dictionary that each is to be expanded is put in pending node set as a node respectively; Nodename is the title of this field dictionary, and node content is the whole entries in this field dictionary; Described entry comprises the explain information of vocabulary and this vocabulary;
Step 1.3: calculate respectively the degree of correlation between the field under the field dictionary of any two the node representatives in pending node set by formula (1);
Wherein, R (d
1, d
2) expression pending node set in a certain field dictionary D
1Affiliated field d
1With another field dictionary D
2Affiliated field d
2The degree of correlation; | d
1∩ d
2| expression field dictionary D
1With field dictionary D
2The number of the identical vocabulary that comprises; Min (| d
1, d
2|) expression field dictionary D
1With field dictionary D
2The vocabulary number that the field dictionary of middle negligible amounts comprises;
Step 1.4: the degree of correlation R (d between the field dictionary of any two the node representatives in the pending node set that obtains from step 1.3
1, d
2) in find out maximal value, use symbol R
maxExpression; This maximal value R
maxTwo corresponding field dictionaries are used respectively symbol D
1' and D
2' expression, field dictionary D
1' and D
2' affiliated field use respectively symbol d
1' and d
2' expression, field dictionary D
1' and D
2' in content use respectively symbol c
1And c
2Expression;
Step 1.5: with field dictionary D
1' and D
2' in entry and also, and give and and after new title of dictionary definition, use D
newExpression; Should and and after dictionary D
newContent symbol c
newExpression, c
new=c
1∪ c
2Then set up a new node, the name of new node is called D
new, the content of new node is c
newField dictionary D
1' and D
2' as node D
newChild node;
Step 1.6: with new node D
newJoin in pending node set, and with node D
1' and D
2' delete from pending node set;
Step 1.7: add up the number of node in pending node set, N represents with symbol; If N 〉=2 turn back to step 1.3; Otherwise, end operation;
Through the operation of above-mentioned steps, namely obtain a domain classification tree;
Step 2, obtain a training set for each field dictionary to be expanded;
This step can with step 1 synchronous operation: determine one with the universal electric dictionary of note, then for the vocabulary in each field dictionary to be expanded, be done as follows respectively: search successively each vocabulary in this field dictionary from the universal electric dictionary with note, then the note that each vocabulary is corresponding is put into training set corresponding to this field as a training data, can obtain the training set in this field;
Through the operation of step 2, corresponding field dictionary to be expanded can obtain training set corresponding to the affiliated field of a field dictionary to be expanded;
Step 3, training set is carried out pre-service, obtain the language material feature set;
On the basis of step 2 operation, successively the corpus in the training set of each field dictionary to be expanded is carried out pre-service, obtain the language material feature set corresponding to training set in this field, be specially: every training data in the training set in some fields is carried out pre-service, obtain one group of vocabulary corresponding to this training data, be called the language material character subset; The set of the language material character subset that the whole training datas in the training set in this field are corresponding is called language material feature set corresponding to this field dictionary;
Described pre-service comprises participle, phrase extraction, lemmatization and removes stop words;
Step 4, on the basis of step 1 and step 3, the leaf node on the domain classification tree that obtains for step 1 is added up the number of times that in language material feature set corresponding to each leaf node, each vocabulary occurs in this language material feature set; For non-leaf node, at first the language material feature set of the child node of each non-leaf node is carried out and also, with the result that the merges language material feature set as this non-leaf node, then add up following data: the number of times that 1. in the language material feature set of this non-leaf node, each vocabulary occurs in the language material feature set of this non-leaf node; 2. for each vocabulary in the language material feature set of this non-leaf node, comprise the number of the language material feature set of this vocabulary in language material feature set corresponding to the child node of this non-leaf node;
Step 5, on the basis of step 4 operation, calculate the degree of confidence of each vocabulary in each language material feature set according to formula (2);
Wherein, wdc represents the degree of confidence of the some vocabulary w in language material feature set corresponding to a certain field d; Wd represents the number of times that vocabulary w occurs in the d of field; Σ wd represents the total degree that occurs in language material feature set corresponding to the father node of corresponding node of language material feature set at vocabulary w place; Dt represents to comprise in language material feature set corresponding to the brotgher of node of corresponding node of language material feature set at vocabulary w place the number of the language material feature set of this vocabulary w;
Step 6, new term is joined in field dictionary to be expanded;
On the basis of step 5 operation, the vocabulary of newly including in the universal electric dictionary with note described in step 2 as new term, is added in field dictionary to be expanded, concrete operation step is:
Step 6.1: the note to new term carries out pre-service, obtains one group of vocabulary corresponding to this vocabulary note, represents the quantity of this group vocabulary with n;
Described pre-service comprises participle, phrase extraction, lemmatization and removes stop words;
Step 6.2: with the root node in the classification tree of field as present node;
Step 6.3: calculate successively degree of membership between field corresponding to each child node of the present node in new term and domain classification tree according to formula (3), and find out maximal value wherein, use symbol sdc
maxExpression;
Wherein, sdc
kRepresent the degree of membership between the new term field k corresponding with each child node of present node in the domain classification tree; wdc
jkThe degree of confidence of j vocabulary and field k in one group of vocabulary corresponding to expression new term note; m
kRepresent in n vocabulary corresponding to new term note, in the highest number of degree of confidence of field k;
Step 6.4: if the maximal value sdc of the degree of membership that step 6.3 obtains
maxGreater than preassigned threshold value, further judge this maximal value sdc
maxWhether corresponding node is leaf node, if leaf node adds new term in field dictionary corresponding to this node; If not leaf node, with this maximal value sdc
maxThen corresponding node turns back to step 6.3 as present node; If the maximal value sdc of the degree of membership that step 6.3 obtains
maxBe not more than preassigned threshold value, with new term as popular word, do not add in any one field dictionary to be expanded end operation to;
Operation through above-mentioned steps can realize the automatic expansion to the field dictionary.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201310046647.3A CN103116573B (en) | 2013-02-06 | 2013-02-06 | A kind of automatic extending method of domain lexicon based on vocabulary annotation |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201310046647.3A CN103116573B (en) | 2013-02-06 | 2013-02-06 | A kind of automatic extending method of domain lexicon based on vocabulary annotation |
Publications (2)
Publication Number | Publication Date |
---|---|
CN103116573A true CN103116573A (en) | 2013-05-22 |
CN103116573B CN103116573B (en) | 2015-10-28 |
Family
ID=48414950
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201310046647.3A Active CN103116573B (en) | 2013-02-06 | 2013-02-06 | A kind of automatic extending method of domain lexicon based on vocabulary annotation |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN103116573B (en) |
Cited By (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103324692A (en) * | 2013-06-04 | 2013-09-25 | 北京大学 | Classified knowledge acquiring method and device |
CN104268160A (en) * | 2014-09-05 | 2015-01-07 | 北京理工大学 | Evaluation object extraction method based on domain dictionary and semantic roles |
CN105955958A (en) * | 2016-05-06 | 2016-09-21 | 长沙市麓智信息科技有限公司 | English patent application document write auxiliary system and write auxiliary method thereof |
CN106681986A (en) * | 2016-12-13 | 2017-05-17 | 成都数联铭品科技有限公司 | Multi-dimensional sentiment analysis system |
CN108197243A (en) * | 2017-12-29 | 2018-06-22 | 北京奇虎科技有限公司 | Method and device is recommended in a kind of input association based on user identity |
CN109299453A (en) * | 2017-07-24 | 2019-02-01 | 华为技术有限公司 | A kind of method and apparatus for constructing dictionary |
CN109325224A (en) * | 2018-08-06 | 2019-02-12 | 中国地质大学(武汉) | A kind of term vector representative learning method and system based on semantic first language |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102360383A (en) * | 2011-10-15 | 2012-02-22 | 西安交通大学 | Method for extracting text-oriented field term and term relationship |
EP2515242A2 (en) * | 2011-04-21 | 2012-10-24 | Palo Alto Research Center Incorporated | Incorporating lexicon knowledge to improve sentiment classification |
-
2013
- 2013-02-06 CN CN201310046647.3A patent/CN103116573B/en active Active
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
EP2515242A2 (en) * | 2011-04-21 | 2012-10-24 | Palo Alto Research Center Incorporated | Incorporating lexicon knowledge to improve sentiment classification |
CN102360383A (en) * | 2011-10-15 | 2012-02-22 | 西安交通大学 | Method for extracting text-oriented field term and term relationship |
Non-Patent Citations (3)
Title |
---|
CHAOYONG ZHU等: "Gloss-based Word Domain Assignment", 《NATURAL LANGUAGE PROCESSING ANDKNOWLEDGE ENGINEERING (NLP-KE), 2011 7TH INTERNATIONAL CONFERENCE ON》, 29 November 2011 (2011-11-29), pages 150 - 155, XP032101542, DOI: 10.1109/NLPKE.2011.6138184 * |
ZHU CHAOYONG等: "Hierarchical Domain Assignment Based on Word-Gloss", 《中国通信》, no. 03, 31 March 2012 (2012-03-31), pages 19 - 27 * |
张海军等: "中文新词识别技术综述", 《计算机科学》, vol. 37, no. 3, 31 March 2010 (2010-03-31) * |
Cited By (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103324692A (en) * | 2013-06-04 | 2013-09-25 | 北京大学 | Classified knowledge acquiring method and device |
CN103324692B (en) * | 2013-06-04 | 2016-05-18 | 北京大学 | Classificating knowledge acquisition methods and device |
CN104268160A (en) * | 2014-09-05 | 2015-01-07 | 北京理工大学 | Evaluation object extraction method based on domain dictionary and semantic roles |
CN104268160B (en) * | 2014-09-05 | 2017-06-06 | 北京理工大学 | A kind of OpinionTargetsExtraction Identification method based on domain lexicon and semantic role |
CN105955958A (en) * | 2016-05-06 | 2016-09-21 | 长沙市麓智信息科技有限公司 | English patent application document write auxiliary system and write auxiliary method thereof |
CN106681986A (en) * | 2016-12-13 | 2017-05-17 | 成都数联铭品科技有限公司 | Multi-dimensional sentiment analysis system |
CN109299453A (en) * | 2017-07-24 | 2019-02-01 | 华为技术有限公司 | A kind of method and apparatus for constructing dictionary |
CN108197243A (en) * | 2017-12-29 | 2018-06-22 | 北京奇虎科技有限公司 | Method and device is recommended in a kind of input association based on user identity |
CN109325224A (en) * | 2018-08-06 | 2019-02-12 | 中国地质大学(武汉) | A kind of term vector representative learning method and system based on semantic first language |
Also Published As
Publication number | Publication date |
---|---|
CN103116573B (en) | 2015-10-28 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN103116573B (en) | A kind of automatic extending method of domain lexicon based on vocabulary annotation | |
CN104391942B (en) | Short essay eigen extended method based on semantic collection of illustrative plates | |
CN107766324B (en) | Text consistency analysis method based on deep neural network | |
CN103207905B (en) | A kind of method of calculating text similarity of based target text | |
CN106649260B (en) | Product characteristic structure tree construction method based on comment text mining | |
CN103970729B (en) | A kind of multi-threaded extracting method based on semantic category | |
CN103123618B (en) | Text similarity acquisition methods and device | |
CN104636466B (en) | Entity attribute extraction method and system for open webpage | |
CN106844658A (en) | A kind of Chinese text knowledge mapping method for auto constructing and system | |
CN102693279B (en) | Method, device and system for fast calculating comment similarity | |
CN104778256B (en) | A kind of the quick of field question answering system consulting can increment clustering method | |
CN105243152A (en) | Graph model-based automatic abstracting method | |
CN101702167A (en) | Method for extracting attribution and comment word with template based on internet | |
CN103399901A (en) | Keyword extraction method | |
CN104484380A (en) | Personalized search method and personalized search device | |
CN107133223B (en) | A kind of machine translation optimization method of the more reference translation information of automatic exploration | |
CN106569993A (en) | Method and device for mining hypernym-hyponym relation between domain-specific terms | |
CN105138514A (en) | Dictionary-based method for maximum matching of Chinese word segmentations through successive one word adding in forward direction | |
CN103646112A (en) | Dependency parsing field self-adaption method based on web search | |
CN105630884A (en) | Geographic position discovery method for microblog hot event | |
CN106528524A (en) | Word segmentation method based on MMseg algorithm and pointwise mutual information algorithm | |
CN104866558A (en) | Training method of social networking account mapping model, mapping method and system | |
CN109522396B (en) | Knowledge processing method and system for national defense science and technology field | |
CN104484433A (en) | Book body matching method based on machine learning | |
CN109614626A (en) | Keyword Automatic method based on gravitational model |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
C14 | Grant of patent or utility model | ||
GR01 | Patent grant |