CN103116573B - A kind of automatic extending method of domain lexicon based on vocabulary annotation - Google Patents

A kind of automatic extending method of domain lexicon based on vocabulary annotation Download PDF

Info

Publication number
CN103116573B
CN103116573B CN201310046647.3A CN201310046647A CN103116573B CN 103116573 B CN103116573 B CN 103116573B CN 201310046647 A CN201310046647 A CN 201310046647A CN 103116573 B CN103116573 B CN 103116573B
Authority
CN
China
Prior art keywords
node
vocabulary
domain
language material
domain lexicon
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201310046647.3A
Other languages
Chinese (zh)
Other versions
CN103116573A (en
Inventor
黄河燕
史树敏
朱朝勇
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Institute of Technology BIT
Original Assignee
Beijing Institute of Technology BIT
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Institute of Technology BIT filed Critical Beijing Institute of Technology BIT
Priority to CN201310046647.3A priority Critical patent/CN103116573B/en
Publication of CN103116573A publication Critical patent/CN103116573A/en
Application granted granted Critical
Publication of CN103116573B publication Critical patent/CN103116573B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Abstract

The present invention relates to a kind of automatic extending method of domain lexicon based on vocabulary annotation, belong to natural language processing technique field.The steps include: 1. by the degree of correlation between field belonging to analysis field dictionary, generate a domain classification tree.2. for each domain lexicon to be expanded obtains a training set.3. pre-service is carried out to training set, obtain language material feature set.4. the number comprising the language material feature set of a certain vocabulary in number of times that in language material feature set corresponding to each node, each vocabulary occurs in this language material feature set and language material feature set corresponding to its child node is added up.5. the degree of confidence of each vocabulary in each language material feature set is calculated.6. new term is joined in domain lexicon to be expanded.The automatic extending method of domain lexicon based on vocabulary annotation that the present invention proposes does not need manually to collect domain corpus, therefore avoids the impact of limitation by the quality and scale of domain corpus and the non-equilibrium property of domain corpus.

Description

A kind of automatic extending method of domain lexicon based on vocabulary annotation
Technical field
The present invention relates to a kind of automatic extending method of domain lexicon, particularly a kind of automatic extending method of domain lexicon based on vocabulary annotation, belongs to natural language processing technique field.
Background technology
Domain lexicon (Domain Dictionary) refers to the set of the distinctive term of specific area or expression way.Domain lexicon is the basic resources of natural language processing, domain knowledge is widely used in the link such as word sense disambiguation, syntactic analysis of the multiple-tasks such as mechanical translation, information retrieval, data mining and text classification, and the scale of domain lexicon and quality are directly connected to the performance of related application.
The structure of domain lexicon and extending method can be divided three classes according to automaticity: based on the artificial constructed of expertise and extending method, and Semi-Automatic Generation and extending method generate and extending method with full-automatic.Artificial constructed high with extending method accuracy rate, but need a large amount of domain experts to participate in for a long time, and cost of labor and time cost are too high, and lack real-time.Full-automatic generation and extending method are by analyzing the difference of vocabulary statistical property in different field corpus, and judge the domain attribute of vocabulary, the method, without the need to the participation of domain expert, saves a large amount of costs of labor, but the accuracy rate that dictionary is included is not high.Automanual generation and extending method, between artificial writing and automatic Generation, specify a small amount of domain knowledge by domain expert, realize the automatic expansion of domain lexicon.Existing semi-automatic and full automatic domain lexicon method needs the support of domain corpus mostly, the quality of the domain lexicon generated depends on the quality of adopted domain corpus, the completeness of domain lexicon is subject to the restriction of domain corpus scale, simultaneously, consider the impact of the non-equilibrium property of corpus, the field mark of word is easier to corpus sweeping field deflection.Above-mentioned two kinds of methods all fail to effectively utilize existing dictionary resources, and the correlativity between non-consideration field.
Summary of the invention
The object of the invention is, for the deficiency of the automatic extending method existence of existing domain lexicon at present, to propose a kind of automatic extending method of domain lexicon based on vocabulary annotation.
The object of the invention is to be achieved through the following technical solutions.
Based on the automatic extending method of domain lexicon of vocabulary annotation, its concrete operation step is:
Step one, by the degree of correlation between field belonging to analysis field dictionary, generate domain classification tree.Be specially:
Step 1.1: represent pending node set with symbol D, and set the original state of pending node set as sky;
Step 1.2: each domain lexicon to be expanded is put in pending node set as a node.Nodename is the title of this domain lexicon, and node content is the whole entries in this domain lexicon; Described entry comprises the explain information of vocabulary and this vocabulary.
Step 1.3: calculated the degree of correlation between field belonging to the domain lexicon representated by any two nodes in pending node set respectively by formula (1), with symbol R (d 1, d 2) represent.
R ( d 1 , d 2 ) = | d 1 ∩ d 2 | min ( | d 1 , d 2 | ) - - - ( 1 )
Wherein, R (d 1, d 2) represent that in pending node set, a certain domain lexicon (uses symbol D 1represent) affiliated field (use symbol d 1represent) and another domain lexicon (use symbol D 2represent) affiliated field (use symbol d 2represent) the degree of correlation; | d 1∩ d 2| represent domain lexicon D 1with domain lexicon D 2the number of the identical vocabulary comprised; Min (| d 1, d 2|) represent domain lexicon D 1with domain lexicon D 2the vocabulary number that the domain lexicon of middle negligible amounts comprises.
Step 1.4: the degree of correlation R (d between the domain lexicon representated by any two nodes in the pending node set obtained from step 1.3 1, d 2) in find out maximal value, use symbol R maxrepresent; This maximal value R maxtwo corresponding domain lexicon use symbol D respectively 1' and D 2' represent, domain lexicon D 1' and D 2' affiliated field use symbol d respectively 1' and d 2' represent, domain lexicon D 1' and D 2' in content use symbol c respectively 1and c 2represent.
Step 1.5: by domain lexicon D 1' and D 2' in entry and also, and give and and after the new title of dictionary definition one, use D newrepresent; Should and and after dictionary D newcontent symbol c newrepresent, c new=c 1∪ c 2.Then set up a new node, the name of new node is called D new, the content of new node is c new.Domain lexicon D 1' and D 2' as node D newchild node.
Step 1.6: by new node D newjoin in pending node set, and by node D 1' and D 2' delete from pending node set.
Step 1.7: the number of adding up pending node set interior joint, represents with symbol N.If N >=2, then turn back to step 1.3; Otherwise, end operation.
Through the operation of above-mentioned steps, namely obtain a domain classification tree.
Step 2, obtain a training set for each domain lexicon to be expanded.
This step can with step one synchronous operation: determine that is with the universal electric dictionary annotated, then for the vocabulary in each domain lexicon to be expanded, be done as follows respectively: from the universal electric dictionary of band annotation, search each vocabulary in this domain lexicon successively, then annotation corresponding for each vocabulary is put into training set corresponding to this field as a training data, the training set in this field can be obtained.
Through the operation of step 2, a corresponding domain lexicon to be expanded, can obtain the training set that belonging to a domain lexicon to be expanded, field is corresponding.
Step 3, pre-service is carried out to training set, obtain language material feature set.
On the basis of step 2 operation, successively pre-service is carried out to the corpus in the training set of each domain lexicon to be expanded, obtain the language material feature set that the training set in this field is corresponding, be specially: the every bar training data in the training set in some fields is carried out to participle, phrase extraction, lemmatization and goes the pre-service such as stop words, obtain one group of vocabulary that this training data is corresponding, be called language material character subset.The set of the language material character subset that the whole training datas in the training set in this field are corresponding is called the language material feature set that this domain lexicon is corresponding.
Step 4, on the basis of step one and step 3, for the leaf node on the domain classification tree that step one obtains, add up the number of times that in language material feature set corresponding to each leaf node, each vocabulary occurs in this language material feature set.For non-leaf nodes, first the language material feature set of the child node of each non-leaf nodes to be carried out and also, using the result that merges as the language material feature set of this non-leaf nodes, then add up following data: the number of times that in the language material feature set of 1. this non-leaf nodes, each vocabulary occurs in the language material feature set of this non-leaf nodes; 2. for each vocabulary in the language material feature set of this non-leaf nodes, the number of the language material feature set of this vocabulary in the language material feature set that the child node of this non-leaf nodes is corresponding, is comprised.
Step 5, step 4 operation basis on, calculate the degree of confidence of each vocabulary in each language material feature set according to formula (2).
wdc = wd Σwd × log ( wd dt + 1 ) - - - ( 2 )
Wherein, wdc represents the degree of confidence of the some vocabulary (representing with symbol w) in the language material feature set that a certain field (representing with symbol d) is corresponding; Wd represents the number of times that vocabulary w occurs in the d of field; Σ wd represents the total degree occurred in the language material feature set that the father node of the corresponding node of the language material feature set at vocabulary w place is corresponding; Dt represents the number comprising the language material feature set of this vocabulary w in the language material feature set that the brotgher of node of the corresponding node of the language material feature set at vocabulary w place is corresponding.
Step 6, new term to be joined in domain lexicon to be expanded.
On the basis of step 5 operation, using the vocabulary with newly including in the universal electric dictionary of annotation described in step 2 as new term, add in domain lexicon to be expanded, concrete operation step is:
Step 6.1: the annotation of new term is carried out to participle, phrase extraction, lemmatization and goes the pre-service such as stop words, obtains one group of vocabulary that this vocabulary annotation is corresponding, represents the quantity of this group vocabulary with n.
Step 6.2: using the root node in the classification tree of field as present node.
Step 6.3: according to formula (3) calculate successively new term set with domain classification in field corresponding to each child node of present node between degree of membership, and find out maximal value wherein, use symbol sdc maxrepresent.
sdc k = m k × Π j = 1 n wdc jk - - - ( 3 )
Wherein, sdc krepresent new term set with domain classification in degree of membership between field (representing with symbol k) corresponding to each child node of present node; wdc jkrepresent the degree of confidence of a jth vocabulary and field k in one group of vocabulary that new term annotation is corresponding; m krepresent in n the vocabulary that new term annotation is corresponding, in the number that the degree of confidence of field k is the highest.
Step 6.4: if the maximal value sdc of degree of membership that step 6.3 obtains maxbe greater than preassigned threshold value, then judge this maximal value sdc further maxwhether corresponding node is leaf node, if leaf node, then new term is added in domain lexicon corresponding to this node; If not leaf node, then by this maximal value sdc maxcorresponding node, as present node, then turns back to step 6.3.If the maximal value sdc of the degree of membership that step 6.3 obtains maxbe not more than preassigned threshold value, then using new term as popular word, do not add in any one domain lexicon to be expanded, end operation.
Through the operation of above-mentioned steps, the automatic expansion to domain lexicon can be realized.
Beneficial effect
The automatic extending method of domain lexicon that the present invention proposes to annotate based on vocabulary is compared with the automatic extending method of existing domain lexicon, its advantage does not need manually to collect domain corpus, therefore avoids the impact of limitation by the quality and scale of domain corpus and the non-equilibrium property of domain corpus.
Accompanying drawing explanation
Fig. 1 is the domain classification tree in the specific embodiment of the invention.
Embodiment
Below in conjunction with the drawings and specific embodiments, the present invention is described in further detail.
Common factor between the lexical information of communication in the mechanical dictionary of Huajian, aviation, machinery and computing machine four domain lexicon and dictionary is as shown in table 1.In table 1, in the domain lexicon in communication, aviation, machinery and computing machine four fields, comprise 12626 vocabulary, 7592 vocabulary, 19250 vocabulary, 5156 vocabulary respectively.The common factor quantity of communication and aviation field dictionary is 4432; The common factor quantity of communication and mechanical field dictionary is 6210; The common factor quantity of communication and computing machine is 2705; The common factor quantity of aviation and mechanical field dictionary is 4908; The common factor quantity of aviation and computer realm dictionary is 2064; The common factor quantity of machinery and computing machine is 2383.
Common factor information slip between the lexical information of table 1 four domain lexicon and dictionary
Communication Aviation Machinery Computing machine
Communication 12626 4432 6210 2705
Aviation 4432 7592 4908 2064
Machinery 6210 4908 19250 2383
Computing machine 2705 2064 2383 5156
The automatic extending method of domain lexicon based on vocabulary annotation using the present invention to propose expands automatically to communication in the mechanical dictionary of Huajian, aviation, machinery and computing machine four domain lexicon, and its concrete operation step is:
Step one, by the degree of correlation between field belonging to analysis field dictionary, generate domain classification tree.Be specially:
Step 1.1: set the original state of pending node set D as empty;
Step 1.2: " communication ", " aviation ", " machinery " and " computing machine " four domain lexicon are put in pending node set respectively as a node.Nodename is the title of this domain lexicon, and node content is the whole entries in this domain lexicon; Described entry comprises the explain information of vocabulary and this vocabulary.
Step 1.3: calculated the degree of correlation R (d between field belonging to the domain lexicon representated by any two nodes in pending node set by formula (1) respectively 1, d 2).
Step 1.4: be aviation and machinery by two fields calculating the known degree of correlation the highest.
Step 1.5: aviation and machinery are merged into a node, calculates new node " aviation & machinery " respectively with computing machine and the degree of correlation communicated
Step 1.6: new node " aviation & machinery " is joined in pending node set, and " aviation " and " machinery " is deleted from pending node set.
Step 1.7: the number of pending node set interior joint is 3, then repeats step 1.3 to 1.7.Until only have a node in pending node set, a domain classification tree can be obtained, as shown in Figure 1.The root node Root of domain classification tree has two child nodes, is " aviation & machinery " and " communicate & computing machine " respectively; Having two child nodes under node " aviation & machinery ", is " aviation " and " machinery " respectively; Having two child nodes under node " communication & computing machine ", is " communication " and " computing machine " respectively.
Step 2, obtain a training set for each domain lexicon to be expanded.
This step can with step one synchronous operation: determine that is with the universal electric dictionary annotated, then for the vocabulary in each domain lexicon to be expanded, be done as follows respectively: from the universal electric dictionary of band annotation, search each vocabulary in this domain lexicon successively, then annotation corresponding for each vocabulary is put into training set corresponding to this field as a training data, the training set in this field can be obtained.
Through the operation of step 2, a corresponding domain lexicon to be expanded, can obtain the training set that belonging to a domain lexicon to be expanded, field is corresponding.
Step 3, pre-service is carried out to training set, obtain language material feature set.
On the basis of step 2 operation, successively pre-service is carried out to the corpus in the training set of each domain lexicon to be expanded, obtain the language material feature set that the training set in this field is corresponding, be specially: the every bar training data in the training set in some fields is carried out to participle, phrase extraction, lemmatization and goes the pre-service such as stop words, obtain one group of vocabulary that this training data is corresponding, be called language material character subset.The set of the language material character subset that the whole training datas in the training set in this field are corresponding is called the language material feature set that this domain lexicon is corresponding.
Step 4, on the basis of step one and step 3, for the leaf node on the domain classification tree that step one obtains, add up the number of times that in language material feature set corresponding to each leaf node, each vocabulary occurs in this language material feature set.For non-leaf nodes, first the language material feature set of the child node of each non-leaf nodes to be carried out and also, using the result that merges as the language material feature set of this non-leaf nodes, then add up following data: the number of times that in the language material feature set of 1. this non-leaf nodes, each vocabulary occurs in the language material feature set of this non-leaf nodes; 2. for each vocabulary in the language material feature set of this non-leaf nodes, the number of the language material feature set of this vocabulary in the language material feature set that the child node of this non-leaf nodes is corresponding, is comprised.
Step 5, step 4 operation basis on, calculate the degree of confidence of each vocabulary in each language material feature set according to formula (2).
Step 6, new term to be joined in domain lexicon to be expanded.
On the basis of step 5 operation, using the vocabulary with newly including in the universal electric dictionary of annotation described in step 2 as new term, add in domain lexicon to be expanded, concrete operation step is:
Step 6.1: the annotation of new term is carried out to participle, phrase extraction, lemmatization and goes the pre-service such as stop words, obtains one group of vocabulary that this vocabulary annotation is corresponding, represents the quantity of this group vocabulary with n.
Step 6.2: using the root node in the classification tree of field as present node.
Step 6.3: according to formula (3) calculate successively new term set with domain classification in field corresponding to each child node of present node between degree of membership, and find out maximal value sdc wherein max.
Step 6.4: if the maximal value sdc of degree of membership that step 6.3 obtains maxbe greater than preassigned threshold value 0.7, then judge this maximal value sdc further maxwhether corresponding node is leaf node, if leaf node, then new term is added in domain lexicon corresponding to this node; If not leaf node, then by this maximal value sdc maxcorresponding node, as present node, then turns back to step 6.3.If the maximal value sdc of the degree of membership that step 6.3 obtains maxbe not more than preassigned threshold value, then using new term as popular word, do not add in any one domain lexicon to be expanded, end operation.
Through the operation of above-mentioned steps, the automatic expansion to domain lexicon can be realized.

Claims (1)

1., based on the automatic extending method of domain lexicon of vocabulary annotation, it is characterized in that: its concrete operation step is:
Step one, by the degree of correlation between field belonging to analysis field dictionary, generate domain classification tree; Be specially:
Step 1.1: represent pending node set with symbol D, and set the original state of pending node set as sky;
Step 1.2: each domain lexicon to be expanded is put in pending node set as a node; Nodename is the title of this domain lexicon, and node content is the whole entries in this domain lexicon; Described entry comprises vocabulary and this vocabulary annotation;
Step 1.3: calculated the degree of correlation between field belonging to the domain lexicon representated by any two nodes in pending node set by formula (1) respectively;
R ( d 1 , d 2 ) = | d 1 ∩ d 2 | min ( | d 1 , d 2 | ) - - - ( 1 )
Wherein, R (d 1, d 2) represent a certain domain lexicon D in pending node set 1affiliated field d 1with another domain lexicon D 2affiliated field d 2the degree of correlation; | d 1∩ d 2| represent domain lexicon D 1with domain lexicon D 2the number of the identical vocabulary comprised; Min (| d 1, d 2|) represent domain lexicon D 1with domain lexicon D 2the vocabulary number that the domain lexicon of middle negligible amounts comprises;
Step 1.4: the degree of correlation R (d belonging to the domain lexicon representated by any two nodes in the pending node set obtained from step 1.3 between field 1, d 2) in find out maximal value, use symbol R maxrepresent; This maximal value R maxtwo corresponding domain lexicon use symbol D ' respectively 1with D ' 2represent, domain lexicon D ' 1with D ' 2affiliated field use symbol d ' respectively 1with d ' 2represent, domain lexicon D ' 1with D ' 2in content use symbol c respectively 1and c 2represent;
Step 1.5: by domain lexicon D ' 1with D ' 2in entry merge, and give the new title of dictionary definition one after merging, use D newrepresent; Dictionary D after this merging newcontent symbol c newrepresent, c new=c 1∪ c 2; Then set up a new node, the name of new node is called D new, the content of new node is c new; Domain lexicon D ' 1with D ' 2as node D newchild node;
Step 1.6: by new node D newjoin in pending node set, and by node D ' 1with D ' 2delete from pending node set;
Step 1.7: the number of adding up pending node set interior joint, represents with symbol N; If N >=2, then turn back to step 1.3; Otherwise, end operation;
Through the operation of above-mentioned steps, namely obtain a domain classification tree;
Step 2, obtain a training set for each domain lexicon to be expanded;
This step and step one synchronous operation: determine that is with the universal electric dictionary annotated, then for the vocabulary in each domain lexicon to be expanded, be done as follows respectively: from the universal electric dictionary of band annotation, search each vocabulary in this domain lexicon successively, then each vocabulary annotation is put into training set corresponding to this field as a training data, the training set in this field can be obtained;
Through the operation of step 2, a corresponding domain lexicon to be expanded, can obtain the training set that belonging to a domain lexicon to be expanded, field is corresponding;
Step 3, pre-service is carried out to training set, obtain language material feature set;
On the basis of step 2 operation, successively pre-service is carried out to the corpus in the training set of each domain lexicon to be expanded, obtain the language material feature set that the training set in this field is corresponding, be specially: pre-service is carried out to the every bar training data in the training set in some fields, obtain one group of vocabulary that this training data is corresponding, be called language material character subset; The set of the language material character subset that the whole training datas in the training set in this field are corresponding is called the language material feature set that this domain lexicon is corresponding;
Described pre-service comprises participle, phrase extraction, lemmatization and removes stop words;
Step 4, on the basis of step one and step 3, for the leaf node on the domain classification tree that step one obtains, add up the number of times that in language material feature set corresponding to each leaf node, each vocabulary occurs in this language material feature set; For non-leaf nodes, first the language material feature set of the child node of each non-leaf nodes is merged, using the result that merges as the language material feature set of this non-leaf nodes, then add up following data: the number of times that in the language material feature set of 1. this non-leaf nodes, each vocabulary occurs in the language material feature set of this non-leaf nodes; 2. for each vocabulary in the language material feature set of this non-leaf nodes, the number of the language material feature set of this vocabulary in the language material feature set that the child node of this non-leaf nodes is corresponding, is comprised;
Step 5, step 4 operation basis on, calculate the degree of confidence of each vocabulary in each language material feature set according to formula (2);
w d c = w d Σ w d × lg ( w d d t + 1 ) - - - ( 2 )
Wherein, wdc represents the degree of confidence of the some vocabulary w in the language material feature set that a certain field d is corresponding; Wd represents the number of times that vocabulary w occurs in the d of field; Σ wd represents the total degree occurred in the language material feature set that the father node of the corresponding node of the language material feature set at vocabulary w place is corresponding; Dt represents the number comprising the language material feature set of this vocabulary w in the language material feature set that the brotgher of node of the corresponding node of the language material feature set at vocabulary w place is corresponding;
Step 6, new term is joined in domain lexicon to be expanded;
On the basis of step 5 operation, using the vocabulary with newly including in the universal electric dictionary of annotation described in step 2 as new term, add in domain lexicon to be expanded, concrete operation step is:
Step 6.1: carry out pre-service to new term annotation, obtains one group of vocabulary that this vocabulary annotation is corresponding, represents the quantity of this group vocabulary with n;
Described pre-service comprises participle, phrase extraction, lemmatization and removes stop words;
Step 6.2: using the root node in the classification tree of field as present node;
Step 6.3: according to formula (3) calculate successively new term set with domain classification in field corresponding to each child node of present node between degree of membership, and find out maximal value wherein, use symbol sdc maxrepresent;
sdc k = m k × Π j = 1 n wdc j k - - - ( 3 )
Wherein, sdc krepresent new term set with domain classification in degree of membership between field k corresponding to each child node of present node; wdc jkrepresent the degree of confidence of a jth vocabulary and field k in one group of vocabulary that new term annotation is corresponding; m krepresent in n the vocabulary that new term annotation is corresponding, in the number that the degree of confidence of field k is the highest;
Step 6.4: if the maximal value sdc of degree of membership that step 6.3 obtains maxbe greater than preassigned threshold value, then judge this maximal value sdc further maxwhether corresponding node is leaf node, if leaf node, then new term is added in domain lexicon corresponding to this node; If not leaf node, then by this maximal value sdc maxcorresponding node, as present node, then turns back to step 6.3; If the maximal value sdc of the degree of membership that step 6.3 obtains maxbe not more than preassigned threshold value, then using new term as popular word, do not add in any one domain lexicon to be expanded, end operation;
Through the operation of above-mentioned steps, the automatic expansion to domain lexicon can be realized.
CN201310046647.3A 2013-02-06 2013-02-06 A kind of automatic extending method of domain lexicon based on vocabulary annotation Active CN103116573B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201310046647.3A CN103116573B (en) 2013-02-06 2013-02-06 A kind of automatic extending method of domain lexicon based on vocabulary annotation

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201310046647.3A CN103116573B (en) 2013-02-06 2013-02-06 A kind of automatic extending method of domain lexicon based on vocabulary annotation

Publications (2)

Publication Number Publication Date
CN103116573A CN103116573A (en) 2013-05-22
CN103116573B true CN103116573B (en) 2015-10-28

Family

ID=48414950

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201310046647.3A Active CN103116573B (en) 2013-02-06 2013-02-06 A kind of automatic extending method of domain lexicon based on vocabulary annotation

Country Status (1)

Country Link
CN (1) CN103116573B (en)

Families Citing this family (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103324692B (en) * 2013-06-04 2016-05-18 北京大学 Classificating knowledge acquisition methods and device
CN104268160B (en) * 2014-09-05 2017-06-06 北京理工大学 A kind of OpinionTargetsExtraction Identification method based on domain lexicon and semantic role
CN105955958A (en) * 2016-05-06 2016-09-21 长沙市麓智信息科技有限公司 English patent application document write auxiliary system and write auxiliary method thereof
CN106681986A (en) * 2016-12-13 2017-05-17 成都数联铭品科技有限公司 Multi-dimensional sentiment analysis system
CN109299453B (en) * 2017-07-24 2021-02-09 华为技术有限公司 Method and device for constructing dictionary and computer-readable storage medium
CN108197243A (en) * 2017-12-29 2018-06-22 北京奇虎科技有限公司 Method and device is recommended in a kind of input association based on user identity
CN109325224B (en) * 2018-08-06 2022-03-11 中国地质大学(武汉) Word vector representation learning method and system based on semantic primitive language

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102360383A (en) * 2011-10-15 2012-02-22 西安交通大学 Method for extracting text-oriented field term and term relationship
EP2515242A2 (en) * 2011-04-21 2012-10-24 Palo Alto Research Center Incorporated Incorporating lexicon knowledge to improve sentiment classification

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP2515242A2 (en) * 2011-04-21 2012-10-24 Palo Alto Research Center Incorporated Incorporating lexicon knowledge to improve sentiment classification
CN102360383A (en) * 2011-10-15 2012-02-22 西安交通大学 Method for extracting text-oriented field term and term relationship

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
Gloss-based Word Domain Assignment;Chaoyong Zhu等;《Natural Language Processing andKnowledge Engineering (NLP-KE), 2011 7th International Conference on》;20111129;150-155 *
Hierarchical Domain Assignment Based on Word-Gloss;Zhu Chaoyong等;《中国通信》;20120331(第03期);19-27 *
中文新词识别技术综述;张海军等;《计算机科学》;20100331;第37卷(第3期);6-10,16 *

Also Published As

Publication number Publication date
CN103116573A (en) 2013-05-22

Similar Documents

Publication Publication Date Title
CN103116573B (en) A kind of automatic extending method of domain lexicon based on vocabulary annotation
CN103123618B (en) Text similarity acquisition methods and device
CN102693279B (en) Method, device and system for fast calculating comment similarity
CN107133223B (en) A kind of machine translation optimization method of the more reference translation information of automatic exploration
CN104391942A (en) Short text characteristic expanding method based on semantic atlas
CN102402561B (en) Searching method and device
CN106294593A (en) In conjunction with subordinate clause level remote supervisory and the Relation extraction method of semi-supervised integrated study
CN104636466A (en) Entity attribute extraction method and system oriented to open web page
CN104268160A (en) Evaluation object extraction method based on domain dictionary and semantic roles
CN101702167A (en) Method for extracting attribution and comment word with template based on internet
CN106569993A (en) Method and device for mining hypernym-hyponym relation between domain-specific terms
CN105138514A (en) Dictionary-based method for maximum matching of Chinese word segmentations through successive one word adding in forward direction
CN104268230B (en) A kind of Chinese micro-blog viewpoint detection method based on heterogeneous figure random walk
CN104778256A (en) Rapid incremental clustering method for domain question-answering system consultations
CN102682000A (en) Text clustering method, question-answering system applying same and search engine applying same
CN104484380A (en) Personalized search method and personalized search device
CN104484433A (en) Book body matching method based on machine learning
CN106528524A (en) Word segmentation method based on MMseg algorithm and pointwise mutual information algorithm
CN103154939A (en) Statistical machine translation method using dependency forest
CN103646112A (en) Dependency parsing field self-adaption method based on web search
CN106156041A (en) Hot information finds method and system
CN104133812A (en) User-query-intention-oriented Chinese sentence similarity hierarchical calculation method and user-query-intention-oriented Chinese sentence similarity hierarchical calculation device
CN105005554A (en) Method for calculating word semantic relevancy
CN102760121B (en) Dependence mapping method and system
CN109522396B (en) Knowledge processing method and system for national defense science and technology field

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant