CN101520775A - Chinese syntax parsing method with merged semantic information - Google Patents

Chinese syntax parsing method with merged semantic information Download PDF

Info

Publication number
CN101520775A
CN101520775A CN200910131827A CN200910131827A CN101520775A CN 101520775 A CN101520775 A CN 101520775A CN 200910131827 A CN200910131827 A CN 200910131827A CN 200910131827 A CN200910131827 A CN 200910131827A CN 101520775 A CN101520775 A CN 101520775A
Authority
CN
China
Prior art keywords
semantic
speech
semantic category
syntactic analysis
information
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN200910131827A
Other languages
Chinese (zh)
Other versions
CN101520775B (en
Inventor
吴玺宏
迟惠生
罗定生
林小俊
樊杨
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Peking University
Original Assignee
Peking University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Peking University filed Critical Peking University
Priority to CN2009101318275A priority Critical patent/CN101520775B/en
Publication of CN101520775A publication Critical patent/CN101520775A/en
Application granted granted Critical
Publication of CN101520775B publication Critical patent/CN101520775B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Landscapes

  • Machine Translation (AREA)

Abstract

The invention discloses a Chinese syntax parsing method with merged semantic information, belonging to the technical field of natural language processing. The method comprises the following steps: step 1), extracting different hierarchical semantic classes of words according to the hyponymy of the knowledge network to obtain indexes from the words to the semantic classes; step 2), using a word in a syntactic tree as a key assignment and query the knowledge network to obtain a semantic class of the word and add the semantic class to a certain layer of the syntactic tree; step 3), using the syntactic tree after being processed in the step 2) as training data to train grammar so as to obtain a grammar model; step 4), utilizing the grammar model after being trained in the step 3) to decode a sentence to be analyzed. Compared with the prior art, the invention adopts the semantic information to disambiguate parsing so that the parsing effect is remarkably improved.

Description

A kind of Chinese syntactic analysis method that incorporates semantic information
Technical field
The invention belongs to the natural language processing technique field, be specifically related to a kind of Chinese syntactic analysis method that incorporates semantic information, in syntactic analysis, introduce semantic knowledge and help improve the performance of syntactic analysis.
Background technology
Syntactic analysis is a very important technology in the middle of the natural language processing, and what it was analyzed is how to be combined to form significant phrase, sentence between speech and the speech, discloses the language regulation of deep layer.The result of syntactic analysis will directly have influence on the understanding to natural language.In the middle of the practical natural Language Processing was used, a high performance parser helped promoting the performance of information extraction, information retrieval, mechanical translation, automatic question answering contour level application system.
The syntactic analysis process is exactly under the situation of the grammatical model of a given cover, derives the syntactic structure of sentence according to certain algorithm, represents with a kind of tree structure usually.For example in short, " Dalian foreign export volume is over half from ' three moneys ' enterprise.", the result who carries out syntactic analysis can be represented by the structure tree in the accompanying drawing 1 (a).In the middle of this tree construction, the leafy node of the bottom is a speech, is called terminal symbol; The non-leafy node on upper strata all is called nonterminal symbol, but not the bottom of leafy node is represented part of speech, is called pre-terminal symbol.Because the natural language ubiquity ambiguousness, for analyzing a plurality of different syntactic structures with a word, therefore just need utilize effective information and algorithm to clear up the ambiguity of existence, find out the most rational syntactic structure, this also is a current various syntactic analysis method problem to be solved.
Utilize method that statistics writes can be from corpus the skewed popularity information of learning Vocabulary and structure, thereby handle the ambiguity problem of syntactic structure to a certain extent.The appearance of the syntactic structure treebank resource of some artificial marks (as the big treebank of guest of Univ Pennsylvania USA's structure) for proposing to have created condition based on the syntactic analysis method of statistics, has promoted the development of this class technology greatly.Statistics study in the syntactic analysis method maximum be probability context-free grammar (PCFG:Probabilistic Context-Free Grammar), it describes sentence structure by a series of context-free grammar rule, and gives every probability that rule is certain.The advantage of this method is that form is simple, can handle in polynomial time.
A problem of PCFG model comes from conditional independence assumption, under this assumed condition, thinks that the expansion of any one nonterminal symbol (i.e. each node more than the speech node in syntax tree) and the expansion of other nonterminal symbols are separate.But by the statistical distribution of each position nonterminal symbol in the treebank is discovered, the expansion of a node sometimes is relevant with position in its place tree, and this point is uncared-for when simple PCFG modeling.In order to address this problem, just need improve basic PC FG model, two kinds of approach are arranged usually: introduce vocabulary information and expansion nonterminal symbol mark, the latter usually is known as the nonlexicalized method again.Introducing the most representative work of vocabulary message context is the syntactic analysis method that centre word drives, representative work such as Michael Collins are that each nonterminal symbol in the syntax rule is introduced information such as vocabulary, distance in the middle of his PhD dissertation, improve the differentiation of the syntax, the method of nonlexicalized syntactic analysis mainly contains by artificial mode carries out refinement to the part nonterminal symbol, thereby and can cover more language phenomenon by the automatic refinement mark of the method for unsupervised learning, representative work is the people's such as Dan Klein of UC Berkeley work.Yet these two kinds of methods also exist defective separately: the introducing of lexical information has brought the sparse problem of certain data in the vocabulary method, in the nonlexicalized method automatically the refinement mark exist problems the such as whether portrayal of language phenomenon accurate.
Summary of the invention
The object of the present invention is to provide a kind of Chinese syntactic analysis method that incorporates semantic information, utilize semantic information to help improve the performance of syntactic analysis, can also have the semantic information of syntactic constraint simultaneously from acquisition in the middle of the sentence structure analysis result.
There has been theory to studies show that semantic information can help the sentence structure disambiguation.What semantic concept was related is implication, structure and the tongue etc. of word, and correlative study can be divided into two parts: study the semanteme (meaning of a word) of single speech and the implication of single speech are how to join together to form the implication of sentence.The main task of semantic analysis is that the lexical semantic unit that produces language text is represented and the dependence between them.Though syntactic analysis and semantic analysis are two different aspects of language analysis, both exist the relation of mutual restriction.The word order of Chinese is very strong to the restriction of semanteme, exists complicated semantic relation between the syntactic constituent.In many cases, only grammatical form being carried out the syntactic structure analysis is not explain the inherent laws of sentence.Therefore, in Chinese syntactic analysis, introduce semanteme and can help clearing up of structural ambiguity.
The prerequisite of using semantic information is to have the predefined semantic standard of a cover, and the most direct way is to use existing semantic resource.Employed semantic resource is to know net (HowNet) in our method.Know that net is a basis that is characterized as with the notion of english-chinese bilingual representative and notion, to disclose between notion and the notion and the pass between the characteristic that notion was had is the general knowledge storehouse of substance.Therefrom we can obtain the notion of different levels of certain speech or the concept attribute semantic category as us, such as we can therefrom obtain semantic category " entity| entity=〉 the thing| all things on earth of " automobile "=the physical| material=the inanimate| inanimate object=the artifact| artifact=the implement| utensil=the vehicle| vehicles=the LandVehicle| car ", this wherein from left to right expression be the semantic category of " automobile " different levels from coarse to fine in HowNet.Such as, " entity| entity " is the semantic category of thick one deck, it is widest in area that he comprises; And " LandVehicle| car " is the semantic category of thin one deck, and the meaning that its is expressed is the thinnest, near " automobile ".
The present invention is dissolved into semantic information in the nonlexicalized syntactic analysis process by investigating the relation of syntactic analysis and semantic analysis, solves the problem that the PCFG model lacks semantic information, and by semantic marker the part of speech layer is carried out further refinement.By introducing semantic information, help syntactic analysis to carry out ambiguity resolution, thereby the performance of syntactic analysis is improved to some extent.
Therefore, basic thought of the present invention is to think that sentence structure and semanteme are two different aspects of language analysis, and they play a role in the middle of the process of language analysis jointly, and influence each other, and semantic information helps clearing up of structural ambiguity very much.By in the nonlexicalized syntactic analysis method, incorporating semantic information, the performance of parser is obviously promoted, and both comprised the modified relationship of sentence structure in the middle of the resulting analysis result, also comprised the semantic classes of each speech.
Starting point of the present invention is to obtain high performance parser, and is that supplementary means improves the syntactic analysis performance with the semantic analysis.What the basic model of syntactic analysis adopted is the PCFG model of non-vocabularyization, and this model is by the automatic refinement mark of the method for unsupervised learning, improves the descriptive power of the syntax, and its performance has surpassed the vocabulary parser.This method on this basis with HowNet as semantic dictionary, the semantic classes of a certain level is provided for the part speech in the middle of the sentence structure treebank, and with pre-terminal symbol (be the last layer of the lexis) level of semantic category, and train the grammatical model that obtains comprising semantic information with the treebank behind the mark attached to syntax tree.Do not need to carry out the syntactic analysis result that any special processing can obtain having semantic marker in decoded portion.Found through experiments the performance that this method effectively raises syntactic analysis.
Divide three parts to introduce technical scheme of the present invention in detail below.
1. semantic information incorporates the mode of syntactic analysis
With HowNet as semantic dictionary, with wherein the definition adopted former (being defined as the least unit of meaning) as semantic classes.Adopted former certain hyponymy that in HowNet, exists, as shown in Figure 2, extract the semantic classes of different levels according to this hyponymy, inquire about obtaining its semantic category with the speech in the syntax tree as key assignments, and with semantic category attached on the pre-terminal symbol.For the consistance that guarantees semantic system and alleviate the sparse problem of data, what here need to guarantee a bit is that the semantic category that all speech inquiries obtain is in same one deck in HowNet.
Just have the problem of word sense disambiguation for the speech that has a plurality of semantic classess, the strategy here is to get first semantic classes; We have designed the meaning classification labeling system of a polysemant on the other hand, adopt the artificial mode that marks that the semantic category of polysemant is marked.For non-existent speech among the HowNet, then do not add semantic information.
What accompanying drawing 1 showed is an example that mark is semantic.Accompanying drawing 1 (a) is the sentence in the preceding treebank of mark; Accompanying drawing 1 (b) is through the sentence behind the semantic tagger, can see that introducing semantic strategy is exactly that semantic classes with certain speech is attached on its pairing pre-terminal symbol.
For the nonterminal symbol more than the part of speech layer, can not from HowNet, directly obtain, the simplest addition manner can adopt the method for extracting centre word that is similar to, and the semantic information of pre-terminal symbol is treated as centre word, extracts on the node of upper strata.But consider that the semantic classes of speech is many, append to the upper strata node and may produce more nonterminal symbol, can produce the very serious sparse problem of data for the insufficient situation of data volume.Therefore, still adopt the mode of not having the automatic division merging of supervision to segment automatically for the upper strata nonterminal symbol, and do not introduce semanteme.
Through after such processing, the pre-terminal symbol in the pairing upper strata of most number in the treebank with regard on the mark certain one deck semantic category among the HowNet, adopt this treebank to carry out the syntactic analysis model training, just can obtain to incorporate the grammatical model of semantic information.Utilize these syntax to decode, can obtain having the syntactic analysis result of semantic marker, the result is also more accurate in syntactic analysis simultaneously.
2. syntactic analysis model training
Basic sentence structure analytical model of the present invention is a nonlexicalized syntactic analysis model, promptly adopts unsupervised mode that nonterminal symbol node mark is carried out refinement, improves the descriptive power of the syntax.Briefly introduce this model below.
In recent years, nonlexicalized PCFG syntactic analysis method has been obtained bigger progress, and the performance of best model has reached the highest level of current syntactic analysis.This model is the automatic refinement nonterminal symbol of the mode mark that passes through unsupervised learning under basic PCFG framework, strengthens the descriptive power of the syntax.The training part of this model mainly comprises division, merges two processes.Fission process is that each nonterminal symbol is split into two, mark is carried out refinement, thereby enlarged grammatical complicacy, has enlarged the coverage to the language phenomenon that occurs in the treebank; Fusion process is which is necessary for the division that guarantees mark in the step toward division, this point is whether to divide for the influence of whole treebank likelihood score and weigh by investigating a certain mark, if promptly dividing the whole treebank likelihood score in sub-mark merging back that with two descends not obvious, then the division of this mark is unnecessary, thereby sub-mark is merged.
Adopt this nonlexicalized syntactic analysis method based on automatic division, at first can guarantee the baseline system of superior performance, this model is convenient to incorporate semantic information simultaneously.In addition, add semantic information, help retraining the automatic division of syntactic marker by the external semantic dictionary; And on the other hand, follow-up automatic division can guarantee that again the semantic category of adding is unlikely to influence the division of syntactic function.
3. syntactic analysis decode procedure
For a new sentence to be analyzed, just can analyze its syntactic structure according to the grammatical model that obtains in the training process.Fundamental method is to adopt the grammar rule in the grammatical model to derive a most probable syntax tree according to the mode of chart analysis is bottom-up, but this its search volume of simple analysis mode is very huge.In order to raise the efficiency, just adopt a kind of analysis strategy from coarse to fine, promptly at first adopt simple grammatical model decoding to obtain a series of candidate result, and then adopt meticulousr grammatical model in these candidate result, to decode again, so just can before the meticulous decoding of back, dismiss many impossible results, thereby reduced the search volume, improved efficient.
Good effect of the present invention:
Compared with prior art, the present invention adopts semantic information to help the syntactic analysis disambiguation, has effectively improved the performance of syntactic analysis, and the efficient of syntactic analysis and accuracy are significantly improved; And can obtain the semantic information of part speech by the parser of this fusion semantic information.
Description of drawings
Syntax tree after Fig. 1 syntax tree and the interpolation semantic information;
(a) be the sentence that marks in the preceding treebank; (b) be through the sentence behind the semantic tagger;
Adopted elite tree fragment example among the semantic resource HowNet of Fig. 2;
Fig. 3 method flow diagram of the present invention.
Embodiment
Describe the specific embodiment of the present invention in detail below in conjunction with accompanying drawing, method flow diagram of the present invention as shown in Figure 3.
1. make up speech-semantic category index
According to define among the HowNet adopted former between hyponymy extract the semantic category of different layers from coarse to fine, and corresponding with each speech, thus construct by the index of speech to semantic category.The speech is here attaching part of speech information.
2. original treebank is added semantic category information
To original treebank, obtain the information of semantic category with speech and part of speech as key assignments, the information with semantic category is attached on part of speech (pre-terminal symbol) level then, realizes the refinement to part of speech layer mark.The part part of speech has just comprised semantic information like this.
May there be a plurality of different semantic categories in some word, has adopted two kinds of strategies at this situation: choose in a plurality of semantemes first, perhaps adopt the mode of artificial mark based on context to select.
3. train grammatical model
With the treebank that added semantic category information as training data.Adopt the nonlexicalized syntactic analysis model of introducing previously to carry out syntax training, adopt automatically the mode that divides, merges to carry out refinement for nonterminal symbol in the training process.On the other hand, also carry out this thinning process in order to investigate the pre-terminal symbol that whether needs having added semantic information, we have carried out experimental verification, found that still segmenting its effect when adding the coarseness semanteme automatically is better than and does not carry out segmenting, and the effect of this way also is better than the stronger fine granularity semanteme of the direct interpolation property distinguished and do not carry out automatic refinement, the introduction that following effect analysis part also can be detailed.
4. treat anolytic sentence and carry out syntactic analysis
The grammatical model that trains has above been arranged, just can adopt the nonlexicalized parser of introducing previously to decode for a sentence to be analyzed (having passed through word segmentation processing) according to grammatical model, obtain the syntactic analysis result, also have the semantic tagger result of this statement simultaneously.
Effect analysis:
In order to verify validity of the present invention, we have designed a series of experiment, below introductory section experiment.
The experiment language material:
The training and testing language material adopts the big Chinese treebank UPenn Chinese Tree Bank 2.0 of guest, and wherein totally 325 pieces of news category language materials adopt standard mode to divide: to use a 1-25 piece of writing as exploitation collection, totally 350 word; A 26-270 piece of writing is as training set, totally 3172 word; A 271-300 piece of writing is as test set, totally 348 word.
Semantic dictionary adopts HowNet.
Baseline system:
Baseline system adopts the nonlexicalized syntactic analysis model of introducing previously, adopt unsupervised method that the nonterminal symbol mark is divided refinement automatically, each iteration is split into 2 with original tally, determine new mark corresponding parameters by the EM algorithm, then merge according to the mark of likelihood score contribution to division.
Evaluation program:
Evaluation program adopts current use syntactic analysis evaluating tool EVALB comparatively widely.This instrument is to be evaluation criterion with the bracket indicia matched, pays close attention to accuracy rate, recall rate and F value.
Experimental result and analysis:
What baseline system was tested on the CTB standard data set the results are shown in Table 1:
Table 1: baseline system performance
Figure A200910131827D00081
S﹠amp wherein; M represents division-merging process round-robin number of times, such as S﹠amp; M-1 represents once to divide-iteration; S﹠amp; M-2 represents to carry out twice division-iteration, promptly once divides-iteration on the grammatical basis that once division-iteration obtains again.Len represents the length of sentence, the speech number that promptly comprises in the sentence, and Len<=40 expression is only tested on less than 40 sentence in length; All is illustrated on all sentences and tests.LR represents recall rate, and LP represents accuracy rate, and F1 represents the F value.
In order to weaken the sparse problem of data to a certain extent, we choose among the HowNet semantic category of top layer, and carry out automatic refinement to institute is underlined, adopt the experimental result such as the table 2 of same data set.
Table 2 adds coarseness semantic category labeled analysis performance
Figure A200910131827D00091
From last table, can find to merge, surpass baseline system by the syntactic analysis performance of adding the semantic information class since the 4th iteration division.In the 6th iteration, divide the meticulous training that occurred crossing, the F value has certain decline, and the trend that presents in baseline system and improvement system is consistent.But the result who adds semantic category still is better than baseline system.Compare with the 5th result who takes turns iteration, the F value has brought up to 81.63% by 80.26%, definitely improves 1.37 points, and this improves quite remarkable in the research of syntactic analysis.
In addition, adopt the big Chinese treebank of guest (comprising 18782 sentences altogether) of 5.0 versions of up-to-date issue to train, syntactic analysis performance of the present invention can reach F value 86.39%.It is similar to add the result who draws on contrast trend and the big Chinese treebank 2.0 of the guest who lists above before and after the semantic information, just repeats no more here.
The present invention is based on the nonlexicalized parser, semantic information is incorporated wherein, utilize semantic information to help syntactic analysis to carry out disambiguation, the parser performance is obviously promoted, and can obtain the semantic information of part speech by the parser of this fusion semantic information.

Claims (8)

1. the Chinese syntactic analysis method in conjunction with semantic information the steps include:
1) extracts the semantic classes of the different levels of speech according to the hyponymy of knowing net, obtain by the index of speech to semantic category;
2) know that as key-value pair net inquires about the semantic category that obtains this speech with the speech in the syntax tree, and semantic category is added on certain one deck of syntax tree;
3) with step 2) syntax tree after handling is as training data, carries out the syntax and trains, and obtains grammatical model;
4) utilize the grammatical model after step 3) is trained that sentence to be analyzed is decoded.
2. the method for claim 1 is characterized in that described certain one deck is pre-terminal symbol layer.
3. method as claimed in claim 2 is characterized in that comprising part of speech information in institute's predicate.
4. method as claimed in claim 3 is characterized in that with speech and part of speech being that key-value pair knows that net inquires about the semantic category that obtains this speech.
5. as claim 1 or 4 described methods, it is characterized in that same one deck semantic category of knowing net is inquired about, the semantic category that all speech inquiries are obtained is in same one deck in knowing net.
6. the method for claim 1 is characterized in that adopting nonlexicalized syntactic analysis model to carry out described syntax training.
7. method as claimed in claim 6 is characterized in that described grammatical training method is: adopt the mode of division automatically, merging to carry out refinement for pre-terminal symbol.
8. the method for claim 1 is characterized in that if there are a plurality of different semantic categories in word, then chooses first semantic category in a plurality of semantemes as the semantic category of this speech, or adopts the mode of artificial mark based on context to select.
CN2009101318275A 2009-02-17 2009-04-08 Chinese syntax parsing method with merged semantic information Expired - Fee Related CN101520775B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN2009101318275A CN101520775B (en) 2009-02-17 2009-04-08 Chinese syntax parsing method with merged semantic information

Applications Claiming Priority (3)

Application Number Priority Date Filing Date Title
CN200910078113.2 2009-02-17
CN200910078113 2009-02-17
CN2009101318275A CN101520775B (en) 2009-02-17 2009-04-08 Chinese syntax parsing method with merged semantic information

Publications (2)

Publication Number Publication Date
CN101520775A true CN101520775A (en) 2009-09-02
CN101520775B CN101520775B (en) 2012-05-30

Family

ID=41081371

Family Applications (1)

Application Number Title Priority Date Filing Date
CN2009101318275A Expired - Fee Related CN101520775B (en) 2009-02-17 2009-04-08 Chinese syntax parsing method with merged semantic information

Country Status (1)

Country Link
CN (1) CN101520775B (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2013088287A1 (en) * 2011-12-12 2013-06-20 International Business Machines Corporation Generation of natural language processing model for information domain
CN103189860A (en) * 2010-11-05 2013-07-03 Sk普兰尼特有限公司 Machine translation device and machine translation method in which a syntax conversion model and a vocabulary conversion model are combined
CN107818781A (en) * 2017-09-11 2018-03-20 远光软件股份有限公司 Intelligent interactive method, equipment and storage medium
CN109298796A (en) * 2018-07-24 2019-02-01 北京捷通华声科技股份有限公司 A kind of Word association method and device
CN109543195A (en) * 2018-11-19 2019-03-29 腾讯科技(深圳)有限公司 A kind of method, the method for information processing and the device of text translation

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5966686A (en) * 1996-06-28 1999-10-12 Microsoft Corporation Method and system for computing semantic logical forms from syntax trees
CN101329666A (en) * 2008-06-18 2008-12-24 南京大学 Automatic analysis method Chinese syntax based on corpus and tree type structural pattern match

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103189860A (en) * 2010-11-05 2013-07-03 Sk普兰尼特有限公司 Machine translation device and machine translation method in which a syntax conversion model and a vocabulary conversion model are combined
WO2013088287A1 (en) * 2011-12-12 2013-06-20 International Business Machines Corporation Generation of natural language processing model for information domain
CN103999081A (en) * 2011-12-12 2014-08-20 国际商业机器公司 Generation of natural language processing model for information domain
US9740685B2 (en) 2011-12-12 2017-08-22 International Business Machines Corporation Generation of natural language processing model for an information domain
CN107818781A (en) * 2017-09-11 2018-03-20 远光软件股份有限公司 Intelligent interactive method, equipment and storage medium
CN109298796A (en) * 2018-07-24 2019-02-01 北京捷通华声科技股份有限公司 A kind of Word association method and device
CN109298796B (en) * 2018-07-24 2022-05-24 北京捷通华声科技股份有限公司 Word association method and device
CN109543195A (en) * 2018-11-19 2019-03-29 腾讯科技(深圳)有限公司 A kind of method, the method for information processing and the device of text translation
CN109543195B (en) * 2018-11-19 2022-04-12 腾讯科技(深圳)有限公司 Text translation method, information processing method and device

Also Published As

Publication number Publication date
CN101520775B (en) 2012-05-30

Similar Documents

Publication Publication Date Title
CN105426539B (en) A kind of lucene Chinese word cutting method based on dictionary
CN103544255B (en) Text semantic relativity based network public opinion information analysis method
CN103399901B (en) A kind of keyword abstraction method
Huck et al. Target-side word segmentation strategies for neural machine translation
CN109408642A (en) A kind of domain entities relation on attributes abstracting method based on distance supervision
CN107832229A (en) A kind of system testing case automatic generating method based on NLP
CN102214166B (en) Machine translation system and machine translation method based on syntactic analysis and hierarchical model
CN105808525A (en) Domain concept hypernym-hyponym relation extraction method based on similar concept pairs
CN104391942A (en) Short text characteristic expanding method based on semantic atlas
CN106570180A (en) Artificial intelligence based voice searching method and device
CN103309926A (en) Chinese and English-named entity identification method and system based on conditional random field (CRF)
CN104636465A (en) Webpage abstract generating methods and displaying methods and corresponding devices
CN103077164A (en) Text analysis method and text analyzer
CN102779135B (en) Method and device for obtaining cross-linguistic search resources and corresponding search method and device
CN106257455A (en) A kind of Bootstrapping algorithm based on dependence template extraction viewpoint evaluation object
CN108549625B (en) Chinese chapter expression theme analysis method based on syntactic object clustering
CN101520775B (en) Chinese syntax parsing method with merged semantic information
CN103324626A (en) Method for setting multi-granularity dictionary and segmenting words and device thereof
CN106202255A (en) Merge the Vietnamese name entity recognition method of physical characteristics
CN109408806A (en) A kind of Event Distillation method based on English grammar rule
CN104598441B (en) A kind of method that computer splits Chinese sentence
CN103886053A (en) Knowledge base construction method based on short text comments
CN106202039B (en) Vietnamese portmanteau word disambiguation method based on condition random field
CN100424685C (en) Syntax analysis method and device for layering Chinese long sentences based on punctuation treatment
Lenci et al. Ontology learning from Italian legal texts

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant
CF01 Termination of patent right due to non-payment of annual fee
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20120530

Termination date: 20180408