CN103116573A - Field dictionary automatic extension method based on vocabulary annotation - Google Patents

Field dictionary automatic extension method based on vocabulary annotation Download PDF

Info

Publication number
CN103116573A
CN103116573A CN2013100466473A CN201310046647A CN103116573A CN 103116573 A CN103116573 A CN 103116573A CN 2013100466473 A CN2013100466473 A CN 2013100466473A CN 201310046647 A CN201310046647 A CN 201310046647A CN 103116573 A CN103116573 A CN 103116573A
Authority
CN
China
Prior art keywords
node
field
vocabulary
dictionary
language material
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN2013100466473A
Other languages
Chinese (zh)
Other versions
CN103116573B (en
Inventor
黄河燕
史树敏
朱朝勇
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Institute of Technology BIT
Original Assignee
Beijing Institute of Technology BIT
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Institute of Technology BIT filed Critical Beijing Institute of Technology BIT
Priority to CN201310046647.3A priority Critical patent/CN103116573B/en
Publication of CN103116573A publication Critical patent/CN103116573A/en
Application granted granted Critical
Publication of CN103116573B publication Critical patent/CN103116573B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Abstract

The invention relates to a field dictionary automatic extension method based on vocabulary annotation and belongs to the technical field of natural language processing. The field dictionary automatic extension method based on the vocabulary annotation comprises the following steps: (1) growing a field classification tree through analyzing the relevancy of a field dictionary which belongs to fields; (2) obtaining a training set for each field dictionary to be extended; (3) processing pretreatment to the training set to obtain a linguistic data feature set; (4) counting times of each panel point, corresponding to each vocabulary in the linguistic data feature set, appearing in the linguistic data feature set and the number of the linguistic data feature set of one vocabulary contained by a secondary panel point, corresponding to the linguistic data feature set, of the panel point; (5) calculating the confidence coefficient of each vocabulary in each linguistic data feature set; (6) adding new vocabulary to the field dictionary to be extended. The field dictionary automatic extension method based on the vocabulary annotation has no need to collect a field corpus by workers, so that the influence of the quality of the field corpus, limit of the scale and unbalance of the field corpus can be avoided.

Description

A kind of automatic extending method of field dictionary based on the vocabulary note
Technical field
The present invention relates to the automatic extending method of a kind of field dictionary, particularly a kind of automatic extending method of field dictionary based on the vocabulary note belongs to the natural language processing technique field.
Background technology
Field dictionary (Domain Dictionary) refers to the set of the distinctive term of specific area or expression way.The field dictionary is the basic resources of natural language processing, domain knowledge is widely used in the links such as the word sense disambiguation, syntactic analysis of the multiple-tasks such as mechanical translation, information retrieval, data mining and text classification, and the scale of field dictionary and quality are directly connected to the performance of related application.
The structure of field dictionary and extending method can be divided three classes according to automaticity: based on the artificial constructed and extending method of expertise, and Semi-Automatic Generation and extending method and full-automatic the generation and extending method.Artificial constructed high with the extending method accuracy rate, but need a large amount of domain experts to participate in for a long time, and cost of labor and time cost are too high, and lack real-time.Full-automatic generation and extending method are judged the domain attribute of vocabulary by analyzing the difference of vocabulary statistical property in the different field corpus, and the method need not domain expert's participation, has saved a large amount of costs of labor, but the accuracy rate that dictionary is included is not high.Automanual generation and extending method are specified a small amount of domain knowledge by the domain expert between artificial writing and automatic Generation, realize the automatic expansion of field dictionary.Existing semi-automatic and full automatic field dictionary methods needs the support of domain corpus mostly, the quality of the field dictionary that generates depends on the quality of the domain corpus that adopts, the completeness of field dictionary is subject to the restriction of domain corpus scale, simultaneously, consider the impact of the non-equilibrium property of corpus, the field of word mark is easier of corpus sweeping field deflection.Above-mentioned two kinds of methods all fail to effectively utilize existing dictionary resources, and the correlativity between the consideration field not.
Summary of the invention
The objective of the invention is the deficiency for the automatic extending method existence of at present existing field dictionary, propose a kind of automatic extending method of field dictionary based on the vocabulary note.
The objective of the invention is to be achieved through the following technical solutions.
A kind of automatic extending method of field dictionary based on the vocabulary note, its concrete operation step is:
Step 1, by the degree of correlation between field under the analysis field dictionary, generates a domain classification and sets.Be specially:
Step 1.1: represent pending node set with symbol D, and the original state of setting pending node set is for empty;
Step 1.2: the field dictionary that each is to be expanded is put in pending node set as a node respectively.Nodename is the title of this field dictionary, and node content is the whole entries in this field dictionary; Described entry comprises the explain information of vocabulary and this vocabulary.
Step 1.3: calculate respectively the degree of correlation between the field under the field dictionary of any two the node representatives in pending node set by formula (1), with symbol R (d 1, d 2) expression.
R ( d 1 , d 2 ) = | d 1 ∩ d 2 | min ( | d 1 , d 2 | ) - - - ( 1 )
Wherein, R (d 1, d 2) represent that in pending node set, a certain field dictionary (is used symbol D 1Expression) symbol d (is used in affiliated field 1The expression) and another field dictionary (use symbol D 2Expression) symbol d (is used in affiliated field 2Expression) the degree of correlation; | d 1∩ d 2| expression field dictionary D 1With field dictionary D 2The number of the identical vocabulary that comprises; Min (| d 1, d 2|) expression field dictionary D 1With field dictionary D 2The vocabulary number that the field dictionary of middle negligible amounts comprises.
Step 1.4: the degree of correlation R (d between the field dictionary of any two the node representatives in the pending node set that obtains from step 1.3 1, d 2) in find out maximal value, use symbol R maxExpression; This maximal value R maxTwo corresponding field dictionaries are used respectively symbol D 1' and D 2' expression, field dictionary D 1' and D 2' affiliated field use respectively symbol d 1' and d 2' expression, field dictionary D 1' and D 2' in content use respectively symbol c 1And c 2Expression.
Step 1.5: with field dictionary D 1' and D 2' in entry and also, and give and and after new title of dictionary definition, use D newExpression; Should and and after dictionary D newContent symbol c newExpression, c new=c 1∪ c 2Then set up a new node, the name of new node is called D new, the content of new node is c newField dictionary D 1' and D 2' as node D newChild node.
Step 1.6: with new node D newJoin in pending node set, and with node D 1' and D 2' delete from pending node set.
Step 1.7: add up the number of node in pending node set, N represents with symbol.If N 〉=2 turn back to step 1.3; Otherwise, end operation.
Through the operation of above-mentioned steps, namely obtain a domain classification tree.
Step 2, obtain a training set for each field dictionary to be expanded.
This step can with step 1 synchronous operation: determine one with the universal electric dictionary of note, then for the vocabulary in each field dictionary to be expanded, be done as follows respectively: search successively each vocabulary in this field dictionary from the universal electric dictionary with note, then the note that each vocabulary is corresponding is put into training set corresponding to this field as a training data, can obtain the training set in this field.
Through the operation of step 2, corresponding field dictionary to be expanded can obtain training set corresponding to the affiliated field of a field dictionary to be expanded.
Step 3, training set is carried out pre-service, obtain the language material feature set.
On the basis of step 2 operation, successively the corpus in the training set of each field dictionary to be expanded is carried out pre-service, obtain the language material feature set corresponding to training set in this field, be specially: every training data in the training set in some fields is carried out participle, phrase extraction, lemmatization and goes the pre-service such as stop words, obtain one group of vocabulary corresponding to this training data, be called the language material character subset.The set of the language material character subset that the whole training datas in the training set in this field are corresponding is called language material feature set corresponding to this field dictionary.
Step 4, on the basis of step 1 and step 3, the leaf node on the domain classification tree that obtains for step 1 is added up the number of times that in language material feature set corresponding to each leaf node, each vocabulary occurs in this language material feature set.For non-leaf node, at first the language material feature set of the child node of each non-leaf node is carried out and also, with the result that the merges language material feature set as this non-leaf node, then add up following data: the number of times that 1. in the language material feature set of this non-leaf node, each vocabulary occurs in the language material feature set of this non-leaf node; 2. for each vocabulary in the language material feature set of this non-leaf node, comprise the number of the language material feature set of this vocabulary in language material feature set corresponding to the child node of this non-leaf node.
Step 5, on the basis of step 4 operation, calculate the degree of confidence of each vocabulary in each language material feature set according to formula (2).
wdc = wd Σwd × log ( wd dt + 1 ) - - - ( 2 )
Wherein, wdc represents the degree of confidence of the some vocabulary (w represents with symbol) in language material feature set corresponding to a certain field (d represents with symbol); Wd represents the number of times that vocabulary w occurs in the d of field; Σ wd represents the total degree that occurs in language material feature set corresponding to the father node of corresponding node of language material feature set at vocabulary w place; Dt represents to comprise in language material feature set corresponding to the brotgher of node of corresponding node of language material feature set at vocabulary w place the number of the language material feature set of this vocabulary w.
Step 6, new term is joined in field dictionary to be expanded.
On the basis of step 5 operation, the vocabulary of newly including in the universal electric dictionary with note described in step 2 as new term, is added in field dictionary to be expanded, concrete operation step is:
Step 6.1: the note to new term carries out participle, phrase extraction, lemmatization and goes the pre-service such as stop words, obtains one group of vocabulary corresponding to this vocabulary note, represents the quantity of this group vocabulary with n.
Step 6.2: with the root node in the classification tree of field as present node.
Step 6.3: calculate successively degree of membership between field corresponding to each child node of the present node in new term and domain classification tree according to formula (3), and find out maximal value wherein, use symbol sdc maxExpression.
sdc k = m k × Π j = 1 n wdc jk - - - ( 3 )
Wherein, sdc kDegree of membership during expression new term and domain classification are set between field (k represents with symbol) corresponding to each child node of present node; wdc jkThe degree of confidence of j vocabulary and field k in one group of vocabulary corresponding to expression new term note; m kRepresent in n vocabulary corresponding to new term note, in the highest number of degree of confidence of field k.
Step 6.4: if the maximal value sdc of the degree of membership that step 6.3 obtains maxGreater than preassigned threshold value, further judge this maximal value sdc maxWhether corresponding node is leaf node, if leaf node adds new term in field dictionary corresponding to this node; If not leaf node, with this maximal value sdc maxThen corresponding node turns back to step 6.3 as present node.If the maximal value sdc of the degree of membership that step 6.3 obtains maxBe not more than preassigned threshold value, with new term as popular word, do not add in any one field dictionary to be expanded end operation to.
Operation through above-mentioned steps can realize the automatic expansion to the field dictionary.
Beneficial effect
The present invention proposes to compare with the existing field automatic extending method of dictionary based on the automatic extending method of field dictionary of vocabulary note, therefore its advantage is not need manually to collect domain corpus, has avoided being subjected to the limitation of quality and scale of domain corpus and the impact of the non-equilibrium property of domain corpus.
Description of drawings
Fig. 1 is the domain classification tree in the specific embodiment of the invention.
Embodiment
Below in conjunction with the drawings and specific embodiments, the present invention is described in further detail.
Lexical information and the common factor between dictionary of communication in the mechanical dictionary of Huajian, aviation, machinery and four field dictionaries of computing machine are as shown in table 1.In table 1, comprise respectively 12626 vocabulary, 7592 vocabulary, 19250 vocabulary, 5156 vocabulary in the field dictionary in communication, aviation, machinery and computing machine four fields.The common factor quantity of communication and aviation field dictionary is 4432; The common factor quantity of communication and mechanical field dictionary is 6210; The common factor quantity of communication and computing machine is 2705; The common factor quantity of aviation and mechanical field dictionary is 4908; The common factor quantity of aviation and computer realm dictionary is 2064; The common factor quantity of machinery and computing machine is 2383.
The lexical information of four field dictionaries of table 1 and the common factor information slip between dictionary
? Communication Aviation Machinery Computing machine
Communication 12626 4432 6210 2705
Aviation 4432 7592 4908 2064
Machinery 6210 4908 19250 2383
Computing machine 2705 2064 2383 5156
The automatic extending method of field dictionary based on the vocabulary note that uses the present invention to propose expands automatically to communication in the mechanical dictionary of Huajian, aviation, machinery and four field dictionaries of computing machine, and its concrete operation step is:
Step 1, by the degree of correlation between field under the analysis field dictionary, generates a domain classification and sets.Be specially:
Step 1.1: the original state of setting pending node set D is empty;
Step 1.2: " communication ", " aviation ", " machinery " and " computing machine " four field dictionaries are put in pending node set as a node respectively.Nodename is the title of this field dictionary, and node content is the whole entries in this field dictionary; Described entry comprises the explain information of vocabulary and this vocabulary.
Step 1.3: calculate respectively the degree of correlation R (d between the field under the field dictionary of any two the node representatives in pending node set by formula (1) 1, d 2).
Step 1.4: be aviation and machinery by calculating two fields that the degree of correlation is the highest as can be known.
Step 1.5: aviation and machinery are merged into a node, calculate new node " Hang Kong ﹠amp; Machinery " respectively with computing machine and the degree of correlation of communicating by letter
Step 1.6: with new node " Hang Kong ﹠amp; Machinery " join in pending node set, and " aviation " and " machinery " deleted from pending node set.
Step 1.7: in pending node set, the number of node is 3, and then repeating step 1.3 to 1.7.Until only have a node in pending node set, can obtain a domain classification tree, as shown in Figure 1.The root node Root of domain classification tree has two child nodes, is respectively " aviation; Machinery " with " ﹠amp communicates by letter; Computing machine "; Node " Hang Kong ﹠amp; Machinery " under two child nodes are arranged, be respectively " aviation " and " machinery "; Node " communication ﹠amp; Computing machine " under two child nodes are arranged, be respectively " communication " and " computing machine ".
Step 2, obtain a training set for each field dictionary to be expanded.
This step can with step 1 synchronous operation: determine one with the universal electric dictionary of note, then for the vocabulary in each field dictionary to be expanded, be done as follows respectively: search successively each vocabulary in this field dictionary from the universal electric dictionary with note, then the note that each vocabulary is corresponding is put into training set corresponding to this field as a training data, can obtain the training set in this field.
Through the operation of step 2, corresponding field dictionary to be expanded can obtain training set corresponding to the affiliated field of a field dictionary to be expanded.
Step 3, training set is carried out pre-service, obtain the language material feature set.
On the basis of step 2 operation, successively the corpus in the training set of each field dictionary to be expanded is carried out pre-service, obtain the language material feature set corresponding to training set in this field, be specially: every training data in the training set in some fields is carried out participle, phrase extraction, lemmatization and goes the pre-service such as stop words, obtain one group of vocabulary corresponding to this training data, be called the language material character subset.The set of the language material character subset that the whole training datas in the training set in this field are corresponding is called language material feature set corresponding to this field dictionary.
Step 4, on the basis of step 1 and step 3, the leaf node on the domain classification tree that obtains for step 1 is added up the number of times that in language material feature set corresponding to each leaf node, each vocabulary occurs in this language material feature set.For non-leaf node, at first the language material feature set of the child node of each non-leaf node is carried out and also, with the result that the merges language material feature set as this non-leaf node, then add up following data: the number of times that 1. in the language material feature set of this non-leaf node, each vocabulary occurs in the language material feature set of this non-leaf node; 2. for each vocabulary in the language material feature set of this non-leaf node, comprise the number of the language material feature set of this vocabulary in language material feature set corresponding to the child node of this non-leaf node.
Step 5, on the basis of step 4 operation, calculate the degree of confidence of each vocabulary in each language material feature set according to formula (2).
Step 6, new term is joined in field dictionary to be expanded.
On the basis of step 5 operation, the vocabulary of newly including in the universal electric dictionary with note described in step 2 as new term, is added in field dictionary to be expanded, concrete operation step is:
Step 6.1: the note to new term carries out participle, phrase extraction, lemmatization and goes the pre-service such as stop words, obtains one group of vocabulary corresponding to this vocabulary note, represents the quantity of this group vocabulary with n.
Step 6.2: with the root node in the classification tree of field as present node.
Step 6.3: calculate successively degree of membership between field corresponding to each child node of the present node in new term and domain classification tree according to formula (3), and find out maximal value sdc wherein max
Step 6.4: if the maximal value sdc of the degree of membership that step 6.3 obtains maxGreater than preassigned threshold value 0.7, further judge this maximal value sdc maxWhether corresponding node is leaf node, if leaf node adds new term in field dictionary corresponding to this node; If not leaf node, with this maximal value sdc maxThen corresponding node turns back to step 6.3 as present node.If the maximal value sdc of the degree of membership that step 6.3 obtains maxBe not more than preassigned threshold value, with new term as popular word, do not add in any one field dictionary to be expanded end operation to.
Operation through above-mentioned steps can realize the automatic expansion to the field dictionary.

Claims (1)

1. automatic extending method of field dictionary based on the vocabulary note, it is characterized in that: its concrete operation step is:
Step 1, by the degree of correlation between field under the analysis field dictionary, generates a domain classification and sets; Be specially:
Step 1.1: represent pending node set with symbol D, and the original state of setting pending node set is for empty;
Step 1.2: the field dictionary that each is to be expanded is put in pending node set as a node respectively; Nodename is the title of this field dictionary, and node content is the whole entries in this field dictionary; Described entry comprises the explain information of vocabulary and this vocabulary;
Step 1.3: calculate respectively the degree of correlation between the field under the field dictionary of any two the node representatives in pending node set by formula (1);
R ( d 1 , d 2 ) = | d 1 ∩ d 2 | min ( | d 1 , d 2 | ) - - - ( 1 )
Wherein, R (d 1, d 2) expression pending node set in a certain field dictionary D 1Affiliated field d 1With another field dictionary D 2Affiliated field d 2The degree of correlation; | d 1∩ d 2| expression field dictionary D 1With field dictionary D 2The number of the identical vocabulary that comprises; Min (| d 1, d 2|) expression field dictionary D 1With field dictionary D 2The vocabulary number that the field dictionary of middle negligible amounts comprises;
Step 1.4: the degree of correlation R (d between the field dictionary of any two the node representatives in the pending node set that obtains from step 1.3 1, d 2) in find out maximal value, use symbol R maxExpression; This maximal value R maxTwo corresponding field dictionaries are used respectively symbol D 1' and D 2' expression, field dictionary D 1' and D 2' affiliated field use respectively symbol d 1' and d 2' expression, field dictionary D 1' and D 2' in content use respectively symbol c 1And c 2Expression;
Step 1.5: with field dictionary D 1' and D 2' in entry and also, and give and and after new title of dictionary definition, use D newExpression; Should and and after dictionary D newContent symbol c newExpression, c new=c 1∪ c 2Then set up a new node, the name of new node is called D new, the content of new node is c newField dictionary D 1' and D 2' as node D newChild node;
Step 1.6: with new node D newJoin in pending node set, and with node D 1' and D 2' delete from pending node set;
Step 1.7: add up the number of node in pending node set, N represents with symbol; If N 〉=2 turn back to step 1.3; Otherwise, end operation;
Through the operation of above-mentioned steps, namely obtain a domain classification tree;
Step 2, obtain a training set for each field dictionary to be expanded;
This step can with step 1 synchronous operation: determine one with the universal electric dictionary of note, then for the vocabulary in each field dictionary to be expanded, be done as follows respectively: search successively each vocabulary in this field dictionary from the universal electric dictionary with note, then the note that each vocabulary is corresponding is put into training set corresponding to this field as a training data, can obtain the training set in this field;
Through the operation of step 2, corresponding field dictionary to be expanded can obtain training set corresponding to the affiliated field of a field dictionary to be expanded;
Step 3, training set is carried out pre-service, obtain the language material feature set;
On the basis of step 2 operation, successively the corpus in the training set of each field dictionary to be expanded is carried out pre-service, obtain the language material feature set corresponding to training set in this field, be specially: every training data in the training set in some fields is carried out pre-service, obtain one group of vocabulary corresponding to this training data, be called the language material character subset; The set of the language material character subset that the whole training datas in the training set in this field are corresponding is called language material feature set corresponding to this field dictionary;
Described pre-service comprises participle, phrase extraction, lemmatization and removes stop words;
Step 4, on the basis of step 1 and step 3, the leaf node on the domain classification tree that obtains for step 1 is added up the number of times that in language material feature set corresponding to each leaf node, each vocabulary occurs in this language material feature set; For non-leaf node, at first the language material feature set of the child node of each non-leaf node is carried out and also, with the result that the merges language material feature set as this non-leaf node, then add up following data: the number of times that 1. in the language material feature set of this non-leaf node, each vocabulary occurs in the language material feature set of this non-leaf node; 2. for each vocabulary in the language material feature set of this non-leaf node, comprise the number of the language material feature set of this vocabulary in language material feature set corresponding to the child node of this non-leaf node;
Step 5, on the basis of step 4 operation, calculate the degree of confidence of each vocabulary in each language material feature set according to formula (2);
wdc = wd Σwd × log ( wd dt + 1 ) - - - ( 2 )
Wherein, wdc represents the degree of confidence of the some vocabulary w in language material feature set corresponding to a certain field d; Wd represents the number of times that vocabulary w occurs in the d of field; Σ wd represents the total degree that occurs in language material feature set corresponding to the father node of corresponding node of language material feature set at vocabulary w place; Dt represents to comprise in language material feature set corresponding to the brotgher of node of corresponding node of language material feature set at vocabulary w place the number of the language material feature set of this vocabulary w;
Step 6, new term is joined in field dictionary to be expanded;
On the basis of step 5 operation, the vocabulary of newly including in the universal electric dictionary with note described in step 2 as new term, is added in field dictionary to be expanded, concrete operation step is:
Step 6.1: the note to new term carries out pre-service, obtains one group of vocabulary corresponding to this vocabulary note, represents the quantity of this group vocabulary with n;
Described pre-service comprises participle, phrase extraction, lemmatization and removes stop words;
Step 6.2: with the root node in the classification tree of field as present node;
Step 6.3: calculate successively degree of membership between field corresponding to each child node of the present node in new term and domain classification tree according to formula (3), and find out maximal value wherein, use symbol sdc maxExpression;
sdc k = m k × Π j = 1 n wdc jk - - - ( 3 )
Wherein, sdc kRepresent the degree of membership between the new term field k corresponding with each child node of present node in the domain classification tree; wdc jkThe degree of confidence of j vocabulary and field k in one group of vocabulary corresponding to expression new term note; m kRepresent in n vocabulary corresponding to new term note, in the highest number of degree of confidence of field k;
Step 6.4: if the maximal value sdc of the degree of membership that step 6.3 obtains maxGreater than preassigned threshold value, further judge this maximal value sdc maxWhether corresponding node is leaf node, if leaf node adds new term in field dictionary corresponding to this node; If not leaf node, with this maximal value sdc maxThen corresponding node turns back to step 6.3 as present node; If the maximal value sdc of the degree of membership that step 6.3 obtains maxBe not more than preassigned threshold value, with new term as popular word, do not add in any one field dictionary to be expanded end operation to;
Operation through above-mentioned steps can realize the automatic expansion to the field dictionary.
CN201310046647.3A 2013-02-06 2013-02-06 A kind of automatic extending method of domain lexicon based on vocabulary annotation Active CN103116573B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201310046647.3A CN103116573B (en) 2013-02-06 2013-02-06 A kind of automatic extending method of domain lexicon based on vocabulary annotation

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201310046647.3A CN103116573B (en) 2013-02-06 2013-02-06 A kind of automatic extending method of domain lexicon based on vocabulary annotation

Publications (2)

Publication Number Publication Date
CN103116573A true CN103116573A (en) 2013-05-22
CN103116573B CN103116573B (en) 2015-10-28

Family

ID=48414950

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201310046647.3A Active CN103116573B (en) 2013-02-06 2013-02-06 A kind of automatic extending method of domain lexicon based on vocabulary annotation

Country Status (1)

Country Link
CN (1) CN103116573B (en)

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103324692A (en) * 2013-06-04 2013-09-25 北京大学 Classified knowledge acquiring method and device
CN104268160A (en) * 2014-09-05 2015-01-07 北京理工大学 Evaluation object extraction method based on domain dictionary and semantic roles
CN105955958A (en) * 2016-05-06 2016-09-21 长沙市麓智信息科技有限公司 English patent application document write auxiliary system and write auxiliary method thereof
CN106681986A (en) * 2016-12-13 2017-05-17 成都数联铭品科技有限公司 Multi-dimensional sentiment analysis system
CN108197243A (en) * 2017-12-29 2018-06-22 北京奇虎科技有限公司 Method and device is recommended in a kind of input association based on user identity
CN109299453A (en) * 2017-07-24 2019-02-01 华为技术有限公司 A kind of method and apparatus for constructing dictionary
CN109325224A (en) * 2018-08-06 2019-02-12 中国地质大学(武汉) A kind of term vector representative learning method and system based on semantic first language

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102360383A (en) * 2011-10-15 2012-02-22 西安交通大学 Method for extracting text-oriented field term and term relationship
EP2515242A2 (en) * 2011-04-21 2012-10-24 Palo Alto Research Center Incorporated Incorporating lexicon knowledge to improve sentiment classification

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP2515242A2 (en) * 2011-04-21 2012-10-24 Palo Alto Research Center Incorporated Incorporating lexicon knowledge to improve sentiment classification
CN102360383A (en) * 2011-10-15 2012-02-22 西安交通大学 Method for extracting text-oriented field term and term relationship

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
CHAOYONG ZHU等: "Gloss-based Word Domain Assignment", 《NATURAL LANGUAGE PROCESSING ANDKNOWLEDGE ENGINEERING (NLP-KE), 2011 7TH INTERNATIONAL CONFERENCE ON》, 29 November 2011 (2011-11-29), pages 150 - 155, XP032101542, DOI: 10.1109/NLPKE.2011.6138184 *
ZHU CHAOYONG等: "Hierarchical Domain Assignment Based on Word-Gloss", 《中国通信》, no. 03, 31 March 2012 (2012-03-31), pages 19 - 27 *
张海军等: "中文新词识别技术综述", 《计算机科学》, vol. 37, no. 3, 31 March 2010 (2010-03-31) *

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103324692A (en) * 2013-06-04 2013-09-25 北京大学 Classified knowledge acquiring method and device
CN103324692B (en) * 2013-06-04 2016-05-18 北京大学 Classificating knowledge acquisition methods and device
CN104268160A (en) * 2014-09-05 2015-01-07 北京理工大学 Evaluation object extraction method based on domain dictionary and semantic roles
CN104268160B (en) * 2014-09-05 2017-06-06 北京理工大学 A kind of OpinionTargetsExtraction Identification method based on domain lexicon and semantic role
CN105955958A (en) * 2016-05-06 2016-09-21 长沙市麓智信息科技有限公司 English patent application document write auxiliary system and write auxiliary method thereof
CN106681986A (en) * 2016-12-13 2017-05-17 成都数联铭品科技有限公司 Multi-dimensional sentiment analysis system
CN109299453A (en) * 2017-07-24 2019-02-01 华为技术有限公司 A kind of method and apparatus for constructing dictionary
CN108197243A (en) * 2017-12-29 2018-06-22 北京奇虎科技有限公司 Method and device is recommended in a kind of input association based on user identity
CN109325224A (en) * 2018-08-06 2019-02-12 中国地质大学(武汉) A kind of term vector representative learning method and system based on semantic first language

Also Published As

Publication number Publication date
CN103116573B (en) 2015-10-28

Similar Documents

Publication Publication Date Title
CN103116573B (en) A kind of automatic extending method of domain lexicon based on vocabulary annotation
CN104391942B (en) Short essay eigen extended method based on semantic collection of illustrative plates
CN107766324B (en) Text consistency analysis method based on deep neural network
CN103207905B (en) A kind of method of calculating text similarity of based target text
CN106649260B (en) Product characteristic structure tree construction method based on comment text mining
CN103970729B (en) A kind of multi-threaded extracting method based on semantic category
CN103123618B (en) Text similarity acquisition methods and device
CN104636466B (en) Entity attribute extraction method and system for open webpage
CN106844658A (en) A kind of Chinese text knowledge mapping method for auto constructing and system
CN102693279B (en) Method, device and system for fast calculating comment similarity
CN104778256B (en) A kind of the quick of field question answering system consulting can increment clustering method
CN105243152A (en) Graph model-based automatic abstracting method
CN101702167A (en) Method for extracting attribution and comment word with template based on internet
CN103399901A (en) Keyword extraction method
CN104484380A (en) Personalized search method and personalized search device
CN107133223B (en) A kind of machine translation optimization method of the more reference translation information of automatic exploration
CN106569993A (en) Method and device for mining hypernym-hyponym relation between domain-specific terms
CN105138514A (en) Dictionary-based method for maximum matching of Chinese word segmentations through successive one word adding in forward direction
CN103646112A (en) Dependency parsing field self-adaption method based on web search
CN105630884A (en) Geographic position discovery method for microblog hot event
CN106528524A (en) Word segmentation method based on MMseg algorithm and pointwise mutual information algorithm
CN104866558A (en) Training method of social networking account mapping model, mapping method and system
CN109522396B (en) Knowledge processing method and system for national defense science and technology field
CN104484433A (en) Book body matching method based on machine learning
CN109614626A (en) Keyword Automatic method based on gravitational model

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant