CN101901210A - Word meaning disambiguating system and method - Google Patents

Word meaning disambiguating system and method Download PDF

Info

Publication number
CN101901210A
CN101901210A CN2009101417374A CN200910141737A CN101901210A CN 101901210 A CN101901210 A CN 101901210A CN 2009101417374 A CN2009101417374 A CN 2009101417374A CN 200910141737 A CN200910141737 A CN 200910141737A CN 101901210 A CN101901210 A CN 101901210A
Authority
CN
China
Prior art keywords
meaning
word
wrap
clothes
step
Prior art date
Application number
CN2009101417374A
Other languages
Chinese (zh)
Inventor
胡长建
赵凯
Original Assignee
日电(中国)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 日电(中国)有限公司 filed Critical 日电(中国)有限公司
Priority to CN2009101417374A priority Critical patent/CN101901210A/en
Publication of CN101901210A publication Critical patent/CN101901210A/en

Links

Abstract

The invention relates to a word meaning disambiguating system for disambiguating polysemous words. The word meaning disambiguating system comprises an input device and a word meaning disambiguating device, wherein the input device is used for inputting texts including the polysemous words; and the word meaning disambiguating device is used for iteratively determining the meaning of each word on the basis of the meaning obvious degree of the word, wherein the meaning obvious degree is obtained according to the meaning reliability of the word. Besides, the invention also relates to a word meaning disambiguating method. The word meaning disambiguating system and method can improve the consistency of word meaning disambiguating results and shorten calculating time.

Description

词义消歧系统和方法 WSD system and method

技术领域 FIELD

[0001] 本发明涉及自然语言处理领域,具体地,涉及一种词义消歧系统和方法。 [0001] The present invention relates to natural language processing, in particular, to a system and method WSD. 背景技术 Background technique

[0002] 在一种语言中,某些词只有一个词义,而某些词有多个词义。 [0002] In a language, certain words have only one meaning, but some words have multiple meanings. 例如汉语中的“电话” 只有一个词义,即通讯工具,而“服”有两个词义(sense),一是衣物,二是吃。 For example Chinese "telephone" has only one meaning, that is, communication tools, and the "service" has two meanings (sense), first, the clothes, the second is to eat. 词义消歧(Word Sense Disambiguation,简称WSD)就是在具体的上下文环境中确定某个多义词的词义,例如在“春服既成,冠者五六人,童子六七人”中确定“服”是衣物的意思,而在“饭后服药”中确定“服”是吃的意思。 WSD (Word Sense Disambiguation, referred to WSD) is to determine the specific context as meaning an ambiguous word, for example, in the "de facto spring clothes, shoot five or six people, the boys six or seven" determined "service" is the clothing meaning, determined in the "medication after a meal" in the "service" mean to eat.

[0003] 词义消歧可以消除词的歧义,确定词的真实含义,这对文本分析和与之相关的各种服务都很有用处。 [0003] disambiguate word sense disambiguation of words, to determine the true meaning of the word, this text analysis and related services are very useful.

[0004] 通常来说词义消歧有两种方式,一是监督式,二是非监督式。 [0004] Generally there are two ways WSD, supervised one, two non-supervised. 前者需要一个人工标注的训练样本集,后者不需要。 The former requires a sample set of training manual annotation, the latter do not. 由于训练样本集需要人工标注,而且一般是基于领域的,也就是说,不同领域需要不同的训练样本集,所以构建的时间和资金成本都比较高。 Since the training samples require manual tagging, and is generally based on the field, that is, different areas require different training set, so build time and money costs are high. 而非监督方法不需要训练样本集,所以相对监督式方法而言具有速度快、成本低等优势。 Rather than supervised training method does not require sample set, so relatively fast approach of oversight has speed and low cost.

[0005] 非监督方法的一个基本思路是考虑上下文(context)。 [0005] A basic idea is to consider the method of unsupervised context (context). 例如“服”字有两个词义,但是上下文中出现“中山装”的时候,则“服”很可能取服装的词义,而不是吃的词义。 For example, "convinced" has two meanings, but "tunic" appears in the context of the time, the "service" is likely to take the meaning of clothing, not meaning to eat. 具体来说,参考文献1 (DianaMcCarthy, Rob Koeling, Julie Weeds, and John Carroll. Findingpredominant word senses in untagged text. In Proceedings of the 42ndMeeting of the Association for Computational Linguistics (ACL' 04), MainVoIume,pp 279-286.)给出一种计算方法。 Specifically, Reference 1 (DianaMcCarthy, Rob Koeling, Julie Weeds, and John Carroll. Findingpredominant word senses in untagged text. In Proceedings of the 42ndMeeting of the Association for Computational Linguistics (ACL '04), MainVoIume, pp 279-286 .) presents a calculation method.

[0006] 图1示出了参考文献1所采用的词义消歧方法的流程图。 [0006] FIG. 1 shows a flowchart of word sense disambiguation reference literature 1 is used. 处理分为四步。 Processing is divided into four steps. 第一, 对每个多义词确定上下文;第二,对每个多义词的每个词义确定和上下文的相似度;第三, 对每个多义词,综合考虑它的每个词义和上下文的相似度,对每个词义计算可信度;第四, 选择具有最大可信度的词义,作为这个多义词的词义。 First, determine for each polysemy context; second, each meaning for each polysemy and context to determine similarity; third, each polysemy, considering the meaning and context of each of its similarity to each meaning calculates the reliability; fourth, meaning selected with the greatest credibility, as the meaning of this ambiguous word.

[0007] 具体来说,假设词w的上下文有η个词,则记为c (w) = In1, n2,. . .,nk}。 [0007] Specifically, if the word w η context word is referred to as c (w) = In1, n2 ,..., Nk}. 设w有m 个词义(简记为ws),记为Senses (w) = (wsi; ws2, . . . , wsm)。 Let w have the meaning of m (abbreviated as ws), referred to as Senses (w) = (wsi;... Ws2,, wsm). 词w的词义Wsi的可信度的计算公式如下: Calculated reliability Wsi meaning of word w as follows:

[0008] [0008]

[0009] 其中S(WSi,nj)是Wsi和w的第j个上下文词η」的相似度。 [0009] where S (WSi, nj) is Wsi and w j-th word context η "similarity. 假设~有1个词义,具体公式为S (wsi? rij) = max (S (wsi? Hsjl),S (wsi? nsj2),. . .,S (wsi? Hsjl)),其中nsjp 代表1的第P个词义。 ~ Has a meaning assume specific formula is S (wsi? Rij) = max (S (wsi? Hsjl), S (wsi? Nsj2) ,..., S (wsi? Hsjl)), where 1 is representative of nsjp P-th meaning. S(WSi,nSjl)是两个词义的相似度,某些字典可以提供这个功能,例如HowNet。 S (WSi, nSjl) is the similarity of two meanings, some of the dictionary can provide this function, e.g. HowNet.

[0010] 下面结合一个示例来说明参考文献所使用的方法。 [0010] The following example will be described in conjunction with a reference method is used. 假设有三个词:{服,装,包},它们互为context,例如c(服)={装,包}。 Suppose there are three words: {clothing, equipment, package}, they are each context, for example, C (clothing) = {installed, package}. 假设它们的词义和词义之间的相似度如表1 和表2所示。 Suppose the similarity between their meaning and the meaning shown in Table 1 and Table 2. 表1示出了服,装,包三个词的词义,表2示出了词义之间的相似度。 Table 1 shows the clothing, equipment, meaning three word packet, Table 2 shows the similarity between the meaning. 例如,表2的第五行表示了相似度S(衣物(clothes),用具(tool)) =0.3。 For example, the fifth row in Table 2 represents the similarity S (laundry (Clothes), tool (tool)) = 0.3. [0011] [0011]

[0012]表 1 [0012] TABLE 1

[0013] [0013]

[0014]表 2 [0014] TABLE 2

[0015] 参考文献1中描述的方法是对每个词同时进行以上流程中的四个步骤。 The method described in [0015] Reference is performed for each word in the above four step process simultaneously.

[0016] 例如,对W=服,第一,确定它的上下文是C(W) = {ni;n2} = {装,包}。 [0016] For example, W = clothes, first to determine its context is C (W) = {ni; n2} = {installed, package}.

[0017] 第二,计算每个词义和上下文的相似度: [0017] Second, calculate the similarity of each context and meaning:

[0018] Senses (w) = (ws1; ws2)=(衣物(clothes),吃(eat)). [0018] Senses (w) = (ws1; ws2) = (laundry (clothes), eating (eat)).

[0019] S(WSpn1) = max (S (衣物(clothes),衣物(clothes)),S (衣物 [0019] S (WSpn1) = max (S (laundry (clothes), the laundry (clothes)), S (laundry

[0020] (clothes),包扎(wrap))) = max(1,0) = 1 [0020] (clothes), wrap (wrap))) = max (1,0) = 1

[0021] S(wsi; n2) = max(S(衣物(clothes),用具(tools)),S(衣物[0022] (clothes),包扎(wrap))) = max (0· 3,0) = 0. 3 [0021] S (wsi; n2) = max (S (laundry (clothes), tool (tools)), S (laundry [0022] (clothes), wrap (wrap))) = max (0 · 3,0) = 0.3

[0023] S (ws2, η》=max (S (吃(eat),衣物(clothes)), S (吃(eat),包扎 [0023] S (ws2, η "= max (S (eat (eat), the laundry (clothes)), S (eat (eat), dressing

[0024] (wrap))) = max(0,0. 2) = 0. 2 [0024] (wrap))) = max (0,0. 2) = 0. 2

[0025] S (ws2, n2) = max (S (吃(eat),用具(tools)),S (吃(eat),包扎 [0025] S (ws2, n2) = max (S (eat (eat), tool (tools)), S (eat (eat), dressing

[0026] (wrap))) = max(0,0. 2) = 0. 2 [0026] (wrap))) = max (0,0. 2) = 0. 2

[0027] 第三,计算每个词义的可信度: [0027] Third, calculate the reliability of each meaning:

[0028] C(Ws1) = S (Ws1, Ii1) / (S (Ws1, Ii1)+S (ws2, Ii1))+S (Ws1, n2) / (S (Ws1, n2)+S (ws2, n2))= 1/(1+0. 2)+0. 3/(0. 3+0. 2) = 1. 43 [0028] C (Ws1) = S (Ws1, Ii1) / (S (Ws1, Ii1) + S (ws2, Ii1)) + S (Ws1, n2) / (S (Ws1, n2) + S (ws2, n2)) = 1 / (1 + 0. 2) +0. 3 / (0. 3 + 0. 2) = 1. 43

[0029] C(ws2) = S (ws2, Ii1) / (S (Ws1, Ii1)+S (ws2, Ii1))+S (ws2, n2) / (S (Ws1, n2)+S (ws2, n2))= 0. 2/(1+0. 2)+0. 2/(0. 3+0. 2) = 0. 57 [0029] C (ws2) = S (ws2, Ii1) / (S (Ws1, Ii1) + S (ws2, Ii1)) + S (ws2, n2) / (S (Ws1, n2) + S (ws2, n2)) = 0. 2 / (1 + 0. 2) +0. 2 / (0. 3 + 0. 2) = 0. 57

[0030] 第四,确定“服”的词义:因为C(Ws1) > C(ws2),所以“服”取Ws1 =衣物(clothes) 的词义。 [0030] Fourth, it is determined "serving" of the meaning: as C (Ws1)> C (ws2), so "service" take Ws1 = laundry (Clothes) of meaning.

[0031] 类似地,对W=装,第一,确定它的上下文是C(W) = {ni;n2} = {服,包}。 [0031] Similarly, W = installed, first, determine its context is C (W) = {ni; n2} = {clothing, package}.

[0032] 第二,计算每个词义和上下文的相似度: [0032] Second, calculate the similarity of each context and meaning:

[0033] Senses (w) = (Ws1, ws2)=(衣物(clothes),包扎(wrap)). [0033] Senses (w) = (Ws1, ws2) = (laundry (clothes), wrap (wrap)).

[0034] S (ws1? η》=max (S (衣物(clothes),衣物(clothes)), S (衣物(clothes),吃(eat))) = max (1,0) = 1 [0034] S (ws1? Η "= max (S (laundry (clothes), the laundry (clothes)), S (laundry (clothes), eating (eat))) = max (1,0) = 1

[0035] S (ws1? n2) =max(S(衣物(clothes),用具(tools)), S(衣物(clothes),包扎(wrap))) = max(0. 3,0) = 0. 3 [0035] S (ws1? N2) = max (S (laundry (clothes), tool (tools)), S (laundry (clothes), wrap (wrap))) = max (0. 3,0) = 0. 3

[0036] S (ws2, n》=max (S (包扎(wrap),衣物(clothes)),S (包扎(wrap),吃(eat))) =max(0,0. 2) = 0. 2 [0036] S (ws2, n "= max (S (wrap (wrap), laundry (clothes)), S (wrap (wrap), eating (eat))) = max (0,0. 2) = 0. 2

[0037] S (ws2,n2) = max (S (包扎(wrap),用具(tools)),S (包扎(wrap),包扎(wrap))) =max (0,1) =1 [0037] S (ws2, n2) = max (S (wrap (wrap), tool (tools)), S (wrap (wrap), wrap (wrap))) = max (0,1) = 1

[0038] 第三,计算每个词义的可信度: [0038] Third, calculate the reliability of each meaning:

[0039] C(Ws1) = S (Ws1, Ii1) / (S (Ws1, Ii1)+S (ws2, Ii1))+S (Ws1, n2) / (S (Ws1, n2)+S (ws2, n2))= 1/(1+0. 2)+0. 3/(0. 3+1) = 1. 06 [0039] C (Ws1) = S (Ws1, Ii1) / (S (Ws1, Ii1) + S (ws2, Ii1)) + S (Ws1, n2) / (S (Ws1, n2) + S (ws2, n2)) = 1 / (1 + 0. 2) +0. 3 / (0. 3 + 1) = 1. 06

[0040] C (ws2) = 0. 2/ (1+0. 2) +1/(0. 3+1) = 0. 94 [0040] C (ws2) = 0. 2 / (1 + 0. 2) + 1 / (0. 3 + 1) = 0. 94

[0041] 第四,确定“服”的词义:因为C(wsl) >C(ws2),所以“装”取Ws1 =衣物(clothes) 的词义。 [0041] Fourth, it is determined "serving" of the meaning: as C (wsl)> C (ws2), so the "loaded" take Ws1 = laundry (Clothes) of meaning.

[0042] 类似地,对W=包,第一,确定它的上下文是C(W) = In1, %} = {服,装}。 [0042] Similarly, W = packet, first, determine its context is C (W) = In1,%} = {clothing, package}.

[0043] 第二,计算每个词义和上下文的相似度: [0043] Second, calculate the similarity of each context and meaning:

[0044] Senses (w) = (ws” ws2)=(用具(tools),包扎(wrap)). [0044] Senses (w) = (ws "ws2) = (tool (tools), wrap (wrap)).

[0045] S (wsi; Ii1) = max (S (用具(tools),衣物(clothes)),S (用具(tools), [0045] S (wsi; Ii1) = max (S (tool (tools), the laundry (clothes)), S (tool (tools),

[0046] 吃(eat))) = max(0. 3,0) = 0. 3 [0046] eat (eat))) = max (0. 3,0) = 0. 3

[0047] S (wsi; n2) = max (S (用具(tools),衣物(clothes)),S (用具(tools), [0047] S (wsi; n2) = max (S (tool (tools), the laundry (clothes)), S (tool (tools),

[0048] 包扎(wrap))) = max(0. 3,0) = 0. 3 [0048] wrap (wrap))) = max (0. 3,0) = 0. 3

[0049] S (ws2, η》=max (S (包扎(wrap),衣物(clothes)),S (包扎(wrap), [0049] S (ws2, η "= max (S (wrap (wrap), laundry (clothes)), S (wrap (wrap),

[0050] 吃(eat))) = max(0,0. 2) = 0. 2 [0050] eat (eat))) = max (0,0. 2) = 0. 2

[0051 ] S (ws2,n2) = max (S (包扎(wrap),衣物(clothes)),S (包扎(wrap),包扎(wrap)))=max (0,1) =1 [0051] S (ws2, n2) = max (S (wrap (wrap), laundry (clothes)), S (wrap (wrap), wrap (wrap))) = max (0,1) = 1

[0052] 第三,计算每个词义的可信度: [0052] Third, calculate the reliability of each meaning:

[0053] C(Ws1) = S (Ws1, Ii1) / (S (Ws1, Ii1)+S (ws2, Ii1))+S (Ws1, n2) / (S (Ws1, n2)+S (ws2, n2))= 0. 3/(0. 3+0. 2)+0. 3/(0. 3+1) = 0. 83 [0053] C (Ws1) = S (Ws1, Ii1) / (S (Ws1, Ii1) + S (ws2, Ii1)) + S (Ws1, n2) / (S (Ws1, n2) + S (ws2, n2)) = 0. 3 / (0. 3 + 0. 2) +0. 3 / (0. 3 + 1) = 0. 83

[0054] C(ws2) = 0. 2/(0. 3+0. 2)+1/(0. 3+1) = 1. 17 [0054] C (ws2) = 0. 2 / (0. 3 + 0. 2) + 1 / (0. 3 + 1) = 1. 17

[0055] 第四,确定“包”的词义:因为C(Ws2) > C(WSl),所以“包”取Ws2 =包扎(wrap)的词义。 [0055] Fourth, the determined meaning "package": Since C (Ws2)> C (WSl), so the "package" take Ws2 = dressing (wrap) of meaning.

[0056] 综合以上三个结果,输出为:{:服:衣物(clothes),装:衣物(clothes),包:包扎(wrap)}ο [0056] Based on the above three results, the output is: {: service: laundry (Clothes), mounted: laundry (Clothes), package: wrap (wrap)} ο

[0057] 由于以上过程是同时计算每个词的词义,结果中可能存在不一致。 [0057] Since the above process is also calculated meaning of each word, the result may be inconsistency. 例如,在上例中,服和装都取的是衣物(clothes)的词义,而包取的是包扎(wrap)的词义。 For example, in the above example, clothing and devices have taken the meaning laundry (Clothes), while the package is taken wrap (wrap) of meaning. 但仔细分析包的计算过程可以发现,包之所以取这个词义,是因为在计算过程中“装”的包扎(wrap)的词义起了决定性的作用(S(ws2,n2) = max(…,S(包扎(wrap),包扎(wrap))) = max(0, 1) = 1)。 However, careful analysis of the calculation process can be found in the package, the package derived its meaning, because in the process of calculation meaning "loaded" wrap (wrap) plays a decisive role (S (ws2, n2) = max (..., S (wrap (wrap), wrap (wrap))) = max (0, 1) = 1). 但是装最后取的却不是包扎(wrap)这个词义,这就导致了不一致。 But the equipment is not taken final wrap (wrap) the meaning, which leads to inconsistency. 上例中正确的结果应该是{:服:衣物(clothes),装:衣物(clothes),包:用具(tools)}。 Embodiment the correct result should be {: service: laundry (clothes), mounted: laundry (clothes), package: tool (tools)}.

发明内容 SUMMARY

[0058] 本发明提出一种渐进式词义消歧系统和方法。 [0058] The present invention provides a progressive WSD systems and methods. 最初只确定一个词的词义,而不是所有词的词义,随后重新计算其它词和对应的上下文的相似度。 Initially determining the meaning of a word, but not the meaning of all the words, and then re-calculates the similarity of context words and the corresponding other. 在重新计算过程中,已经确定词义的词只考虑已经确定的那个词义,而忽略该词的其它词义。 In the recalculation process, the meaning of the word has been determined considering only that meaning has been determined, while ignoring the other meaning of the word. 重复这个过程直到确定了所有词的词义。 Repeat this process until all determine the meaning of the word.

[0059] 根据本发明第一方面,提出了一种词义消歧系统,用于对多义词进行词义消歧,包括:输入装置,用于输入包括多义词的文本;以及词义消歧装置,用于基于所述词的词义明显度来迭代地确定每个词的词义,其中词义明显度是根据所述词的词义可信度获得的。 [0059] According to a first aspect of the present invention, proposes a WSD system for Polysemies for WSD, comprising: input means for inputting text comprising Polysemons; and WSD means, based on the meaning of the word iteratively determine significant meaning of each word, which is of obvious meaning according to the meaning of the word confidence obtained.

[0060] 根据本发明第二方面,提出了一种词义消歧方法,用于对多义词进行词义消歧,包括:输入步骤,输入包括多义词的文本;以及词义消歧步骤,基于所述词的词义明显度来迭代地确定每个词的词义,其中词义明显度是根据所述词的词义可信度获得的。 [0060] According to a second aspect of the present invention, it proposes a word sense disambiguation method for Polysemies for WSD, comprising: an input step of inputting includes text Polysemons; and WSD step, based on the word clear meaning of iteratively determining the meaning of each word, which is of obvious meaning according to the meaning of the word confidence obtained.

[0061] 优选地,为了保证结果的正确性,在确定词义时,选择词义最明显的那个词确定词义。 [0061] Preferably, in order to ensure the correctness of the result, in determining the meaning of a word, the most obvious choice meaning meaning word determination. 例如,基于词义的可信度计算明显度,则词义的可信度越大,词义越明显。 For example, calculations based confidence meaning significant degree, meaning the greater the confidence, the more significant meaning.

[0062] 由于渐进式过程的计算时间可能比传统方法有所延长,本发明还提出了减少计算时间、加快计算过程的方法。 [0062] Since the calculation time is gradual process may be longer than the conventional method, the present invention also proposes a method to reduce the computation time, speed up the calculation process. 本发明最初确定多个词的词义,而不是只确定一个词的词义, 并且尽量选择与确定的词义保持一致的词。 Initially determining the meaning of the present invention a plurality of words, rather than determining the meaning of a word, and try to select consistent with the determined meaning of the word. 由于减少计算时间可能导致结果中出现不一致,所以这是个折中的方案。 Due to the reduced computing time may lead to inconsistencies in the results, so this is a compromise.

[0063] 优选地,为了节省计算时间,在确定词义时,选择词义明显度大于一阈值的词。 [0063] Preferably, in order to save computing time, in determining the meaning, select word meaning of significantly greater than a threshold value.

[0064] 优选地,为了节省计算时间,在确定词义时,根据词义明显度对词进行排序并从中选择前η个词。 [0064] Preferably, in order to save computing time, in determining the meaning of a word, according to the meaning of the words clearly sort and select η words before.

[0065] 优选地,为了节省计算时间,在已经确定了一个词的词义之后,猜测词义未确定词可能的词义,以及根据猜测的词义是否与已确定词义一致获取词义未确定词的词义。 [0065] Preferably, in order to save computing time, after having determined the meaning of a word, the word may Guessing undetermined meaning, and meaning are the same meaning acquired undetermined word meaning has been determined in accordance with the Guessing.

[0066] 由此,本发明提高了词义消歧结果的一致性,并在此过程中保持结果的正确性,以 [0066] Accordingly, the present invention improves the consistency of word sense disambiguation results and the results of maintaining accuracy in the process, to

7及克服了计算时间长的缺点。 7 and overcomes the disadvantages of a long calculation time. 附图说明 BRIEF DESCRIPTION

[0067] 图1示出了已有技术的词义消歧方法的流程图; [0067] FIG. 1 shows a flowchart of word sense disambiguation prior art method;

[0068] 图2a示出了本发明第一实施例的词义消歧系统的示意图; [0068] Figure 2a shows a schematic view of a first embodiment of the WSD system embodiment of the present invention;

[0069] 图2b示出了根据本发明的词义消歧方法的流程图; [0069] Figure 2b shows a flowchart of a method of word sense disambiguation invention;

[0070] 图2c示出了根据本发明的词义消歧方法的另一个流程图; [0070] FIG 2c shows another flowchart of a method of WSD invention;

[0071] 图2d示出了根据本发明的词义消歧方法的另一个流程图; [0071] Figure 2d shows a flow diagram according to another WSD method according to the invention;

[0072] 图3a示出了根据本发明第二实施例的词义消歧系统的示意图; [0072] Figure 3a shows a schematic WSD system according to a second embodiment of the present invention;

[0073] 图3b示出了根据本发明的词义消歧方法的另一个流程图。 [0073] FIG. 3b shows another flowchart of a method of WSD invention.

具体实施方式 Detailed ways

[0074] 下面,将参考附图描述本发明的优选实施例。 [0074] Next, with reference to the accompanying drawings a preferred embodiment of the present invention. 在附图中,相同的元件将由相同的参考符号或数字表示。 In the drawings, the same elements by the same reference symbols or numerals. 此外,在本发明的下列描述中,将省略对已知功能和配置的具体描述, 以避免使本发明的主题不清楚。 Further, in the following description of the present invention, a detailed description will be omitted of known functions and configurations to avoid obscuring the subject matter of the present invention unclear. 图2a示出了根据本发明第一实施例的词义消歧系统。 Figure 2a shows a WSD system according to a first embodiment of the present invention. 该系统包括输入装置21,上下文确定装置22,词义消歧装置2和存储器(未示出)。 The system includes an input device 21, the context determination unit 22, WSD means 2 and a memory (not shown). 输入装置21用于接收输入的文本,文本包括具有多个词义的多义词。 An input means 21 for receiving an input text, the text comprising a plurality of polysemous words having a meaning. 上下文确定装置22用于对文本中的每个多义词确定其上下文。 Context determination means 22 for determining the context of each text polysemous words. 对于一个多义词,其在文本中的一个或多个相邻的词可以看做是该词的上下文。 For more than one word, which is in the text of one or more adjacent words can be seen as the context of the word. 词义消歧装置2包括相似度计算单元23,词义可信度计算单元24,词义明显度计算单元25,选词单元26,词义确定单元27和控制器28。 WSD apparatus 2 comprises a similarity calculating unit 23, meaning the reliability calculation unit 24, calculation unit 25 clear meaning, diction unit 26, meaning determination unit 27 and a controller 28. 相似度计算单元23用于计算每个多义词的词义与其上下文之间的相似度。 Similarity calculating unit 23 for calculating a similarity between each of the meaning of polysemous words its context. 已经存在一些词典可以提供计算两个词义之间的相似度的功能,例如,可以使用WordNet (英文)或者HowNet (中文)词典来获得两个多义词的词义之间的相似度。 There have been some dictionaries can provide compute the similarity of function between the two meanings, for example, may use WordNet (English) or HowNet (Chinese) dictionary to get the similarity between the two meanings of ambiguous words. 词义可信度计算单元24用于基于获得的相似度计算词的词义可信度。 Semantic meaning the reliability calculation unit 24 for calculating the reliability based on the similarity of the words obtained. 可以采用参考文献1的方法计算词义可信度。 Reference may be employed a method of calculating the semantic confidence. 词义明显度计算单元25用于基于词的词义可信度获得词的词义明显度。 Meaning obvious calculation unit 25 for obtaining the word-based confidence meaning word meaning significant degree. 词义明显度表示了多义词取某个词义的可能性。 Meaning clearly indicate the possibility of taking a polysemy of meaning. 选词单元26用于根据词义明显度选择满足预定条件的词,例如,选择词义明显度最大的词,选择词义明显度大于一阈值的词,或者从按照明显度排序后的多义词中选择前η 个词。 Diction unit 26 according to the meaning significant degree satisfies a predetermined condition words, for example, selecting the meaning apparent maximum degree word selection meaning significantly greater than the word a threshold value, or η from the front selected according to the polysemous word after a significant degree of ordering words. 词义确定单元27,用于确定选择的词的词义。 Meaning determination unit 27 for determining the meaning of the selected word. 从而可以在每一个循环中确定一个词的词义,或者在每一个循环中确定多个词的词义。 Meaning of a word can be determined in each cycle, or determining the meaning of words in each of a plurality of cycles. 控制器28,用于控制相似度计算单元23, 词义可信度计算单元24,词义明显度计算单元25,选词单元26和词义确定单元27的操作。 A controller 28 for controlling the similarity calculating unit 23, meaning the reliability calculation unit 24, meaning significant calculation unit 25, an operation unit 26 and the selected word meaning determination unit 27. 从而各个单元在控制器的控制下对输入的文本中的多义词循环进行相似度计算,可信度计算,词义明显度计算,选词,确定词义,直到对文本中的每一个多义词确定了该多义词在文本中的词义。 Whereby the respective units under the control of the controller Polysemies cycle input text similarity calculating, the reliability is calculated, meaning significant computation, word choice, determined meaning, until for each of more than one word in the text to determine the polysemous word in the meaning of the text.

[0075] 虽然图2a示出本发明的词义消歧系统包括上下文确定装置22,但是可以理解的是词义消歧系统也可以不包括该上下文确定装置,而是使用输入的已经确定了上下文的文本。 [0075] Although Figure 2a shows the WSD system of the invention includes a context determining means 22, it will be appreciated that the system WSD context may not include the determination means, but the use of the input text have been identified context .

[0076] 图2b示出了根据本发明的词义消歧方法。 [0076] FIG. 2b shows a WSD method of the present invention. 在S201,词义消歧系统的输入装置20 输入文本。 In S201, the input system 20 WSD text input means. 在S202,上下文确定装置22确定文本中的每个多义词的上下文。 In S202, the context determination unit 22 determines a context of the text of each polysemous words. 在S203,词义消歧装置的相似度计算单元23分别确定每个多义词的各个词义和上下文的相似度。 In S203, the similarity calculation unit 23 WSD devices are determined for each individual meaning and context of the polysemous word similarity. in

8S204,词义可信度计算单元24计算每个多义词的各个词义的可信度。 8S204, meaning the reliability calculation unit 24 calculates the reliability of each of the individual meaning of polysemous words.

[0077] 在S205,词义明显度计算单元25计算每个多义词的词义明显度。 [0077] In S205, meaning meaning significant degree calculating unit 25 calculates each Polysemies significant degree. 可以使用下列两种可选公式之一计算多义词的词义明显度。 Polysemy can be calculated using one of two alternative formulas obvious meaning degrees.

〜.Max(Cw) - Second—Max(Cw) ~.Max (Cw) - Second-Max (Cw)

[0078] E (w) = Max (Cw) E(w) =---? ^ 了门、- ? [0078] E (w) = Max (Cw) E (w) = --- ^ the door -

Second—Max (CwJ Second-Max (CwJ

[0079] 其中,第一个公式中的Max(Cw)是词w的所有的词义可信度中最大的可信度,而Second_Max(Cw)是次大的可信度。 [0079] wherein the first equation Max (Cw) are all credibility meaning of word w in the maximum rate of confidence in Second_Max (Cw) is second largest confidence. 第二个公式用于衡量最大可信度超越次大可信度的程度。 The second formula is used to measure the maximum degree of credibility beyond the second largest credibility.

[0080] 对两个公式而言,E(W)越大,则词w的词义越明显,因此可以越早地在循环中确定该词的词义。 [0080], E (W) greater for two formulas, the meaning of the word w more obvious, it is possible to determine the meaning of the word earlier in the cycle. 例如在“服装包”示例中,服的两个词义可信度分别为1.43和0.57,而装的两个词义可信度分别为1. 06和0. 94,那么服的两个词义差别很大,服的词义比较确定,应该取可信度值为1.43的那个词义,而装的两个词义差别不大,不能确定应该取哪个词义。 For example, in "clothing package" example, two service reliability meanings respectively 1.43 and 0.57, and the reliability of the package two meanings respectively 1.06 and 0.94, then the difference is serving two meanings large, clothing compared to determine the meaning, the meaning that should take the credibility of 1.43, while the two loaded meaning little difference, which can not be determined should take meaning. 所以,如果只考虑服和装两个词的话,应该先确定服的词义,再根据已确定的服的词义确定装的词义。 So, if you consider only the clothing and equipment the two words, you should first determine the meaning of clothes, and then to determine the meaning of a word filled with meaning according to established clothing.

[0081] 之后,在S206,选词单元26选择词义明显度最大的词,并对选出的词确定词义。 After [0081] In S206, the choice of words of the unit 26 selects the maximum significant meaning words, and the selected word meaning determination. 可以比较选出的词的各个词义的可信度,并取可信度最大的那个词义作为选出的词的词义。 The meaning of each word can be compared elected credibility, and credibility to take as meaning that the maximum selected word meaning. 在S208,控制器28判断是否已经确定了所有多义词的词义。 In S208, the controller 28 determines whether all meaning has been determined Polysemy. 如果没有,则执行S203,否则 If not, perform S203, otherwise

结束处理。 The end of the process.

[0082] 下面还以“服装包”一词为例,对上述方法进行简单说明。 [0082] In the following further term "garment bag" as an example, the above method is briefly described.

[0083] 第一循环: [0083] First cycle:

[0084] (1)确定上下文,计算相似度和可信度与已有技术采用的方式相同,这里不再描述。 [0084] (1) determine the context, the degree of similarity calculated in the same manner and the reliability of the prior art use, will not be described herein.

[0085] (2)根据上述求E (W)的第二个公式,计算词的词义明显度: [0085] (2) The second equation above requirements E (W) is calculated obvious meaning of the words:

[0086] E(服)= (1. 43- -0. 57), /o. 57 = 1. 51[0087] E(装)= (1. 06- -ο· 94), /o. 94 = 0. 13[0088] E(包)= (1. 17- -ο· 83), /o. 83 = 0. 41 [0086] E (clothing) = (1. 43- -0. 57), / o. 57 = 1. 51 [0087] E (means) = (1. 06- -ο · 94), / o. 94 = 0. 13 [0088] E (packet) = (1. 17- -ο · 83), / o. 83 = 0. 41

[0089] (3)选择词义明显度最大的词,这里选择“服”。 [0089] (3) Select the obvious meaning of the word's largest, here select the "service."

[0090] (4)最后,确定服的词义。 [0090] (4) Finally, the meaning assigned to serve. 因为C(Ws1) > C(Ws2),所以取Ws1 =衣物(clothes)的词义。 Since C (Ws1)> C (Ws2), so take Ws1 = laundry (Clothes) of meaning.

[0091] 第二循环: [0091] Second cycle:

[0092] 还剩下“装”和“包”两个字,以下分别计算。 [0092] left "loaded" and "package" word, the following were calculated. 由于在第一循环中已经确定了服的词义,因此,在以下的计算中,服只取衣物(clothes)的词义,而不再取吃(eat)的词义。 Since the first cycle of the clothes has been determined meaning, therefore, in the following calculations, the laundry clothes just take the meaning (Clothes), rather than taking eating (EAT) of meaning.

[0093]对 w =装:(c(w) = {叫,n2} = {服,包}),Senses (w) = (wsi; ws2)=(衣物(clothes),包扎(wrap)). [0093] The w = means: (c (w) = {name, n2} = {clothing, package}), Senses (w) = (wsi; ws2) = (laundry (Clothes), wrap (wrap)).

[0094] (1)计算相似度 [0094] (1) calculate the similarity

[0095] S(WSpn1) = max(S(衣物(clothes),衣物(clothes))) = max(l)= [0095] S (WSpn1) = max (S (laundry (clothes), the laundry (clothes))) = max (l) =

[0096] 1 [0096] 1

[0097] S(wsi; n2) =max(S(衣物(clothes),用具(tools)),S(衣物(clothes),包扎(wrap))) = max(0. 3,0) = 0. 3 [0097] S (wsi; n2) = max (S (laundry (clothes), tool (tools)), S (laundry (clothes), wrap (wrap))) = max (. 0 3,0) = 0. 3

9[0098] S(ws2,叫)=max(S(包扎(wrap),衣物(clothes))) = max(0) = 0 9 [0098] S (ws2, name) = max (S (wrap (wrap), laundry (clothes))) = max (0) = 0

[0099] S (ws2,n2) = max (S (包扎(wrap),用具(tools)),S (包扎(wrap), [0099] S (ws2, n2) = max (S (wrap (wrap), tool (tools)), S (wrap (wrap),

[0100] 包扎(wrap))) = max (0,1) = 1 [0100] wrap (wrap))) = max (0,1) = 1

[0101] (2)计算词义可信度 [0101] (2) calculating the semantic confidence

[0102] C(Ws1) = S (Ws1, Ii1) / (S (Ws1, Ii1)+S (ws2, Ii1))+S (Ws1, n2) / (S (Ws1, n2)+S (ws2, n2))= 1/(1+0)+0. 3/(0. 3+1) = 1. 23 [0102] C (Ws1) = S (Ws1, Ii1) / (S (Ws1, Ii1) + S (ws2, Ii1)) + S (Ws1, n2) / (S (Ws1, n2) + S (ws2, n2)) = 1 / (1 + 0) +0. 3 / (0. 3 + 1) = 1. 23

[0103] C(ws2) = 0/(1+0)+1/(0. 3+1) = 0. 77 [0103] C (ws2) = 0 / (1 + 0) + 1 / (0. 3 + 1) = 0. 77

[0104] (3)计算词义明显度 [0104] (3) calculating the semantic conspicuity

[0105] E (装)=(1. 23-0. 77) /0. 77 = 0. 6 [0105] E (means) = (1. 23-0. 77) / 0.77 = 0.6

[0106] 对W=包: [0106] W = packages of:

[0107] (c(w) = In1, n2} = {服,装}),Senses (w) = (wsi; ws2)=(用具(tools),包扎(wrap)). [0107] (c (w) = In1, n2} = {clothing, loaded}), Senses (w) = (wsi; ws2) = (utensils (Tools), wrap (wrap)).

[0108] (1)计算相似度 [0108] (1) calculate the similarity

[0109] S(WSpn1) = max(S(用具(tools),衣物(clothes))) = max(0. 3) =0.3 [0109] S (WSpn1) = max (S (tool (tools), the laundry (clothes))) = max (0. 3) = 0.3

[0110] S(wsi; n2) = max(S(用具(tools),衣物(clothes)),S(用具(tools),包扎(wrap))) = max(0. 3,0) = 0. 3 [0110] S (wsi; n2) = max (S (tool (tools), the laundry (clothes)), S (tool (tools), wrap (wrap))) = max (. 0 3,0) = 0. 3

[0111] S(WSyn1) = max(S(包扎(wrap),衣物(clothes))) = max(0) =0 [0111] S (WSyn1) = max (S (wrap (wrap), laundry (clothes))) = max (0) = 0

[0112] S (ws2,n2) = max (S (包扎(wrap),衣物(clothes)),S (包扎(wrap),包扎(wrap))) =max (0,1) =1 [0112] S (ws2, n2) = max (S (wrap (wrap), laundry (clothes)), S (wrap (wrap), wrap (wrap))) = max (0,1) = 1

[0113] (2)计算词义可信度 [0113] (2) calculating the semantic confidence

[0114] C(Ws1) = S (Ws1, Ii1) / (S (Ws1, Ii1)+S (ws2, Ii1))+S (Ws1, n2) / (S (Ws1, n2)+S (ws2, n2))= 0. 3/(0. 3+0)+0. 3/(0. 3+1) = 1. 23 [0114] C (Ws1) = S (Ws1, Ii1) / (S (Ws1, Ii1) + S (ws2, Ii1)) + S (Ws1, n2) / (S (Ws1, n2) + S (ws2, n2)) = 0. 3 / (0. 3 + 0) +0. 3 / (0. 3 + 1) = 1. 23

[0115] C (ws2) = 0/ (0. 3+0) +1/(0. 3+1) = 0. 77 [0115] C (ws2) = 0 / (0. 3 + 0) + 1 / (0. 3 + 1) = 0. 77

[0116] (3)计算词义明显度 [0116] (3) calculating the semantic conspicuity

[0117] E (包)=(1. 23-0. 77) /0. 77 = 0. 6 [0117] E (packet) = (1. 23-0. 77) / 0.77 = 0.6

[0118] (4)选择词义明显的最大的词 [01] (4) Select the obvious meaning of the word's largest

[0119](第二循环):因为装和包的明显度相同,可以选择任意一个。 [0119] (second recycle): Because significant degree of loading and the same packet, can select any one. 例如选“装”(选“包”的结果一样)。 For example (as a result selected from the "package") selected from the "loaded."

[0120] (5)确定选择的词的词义 Meaning Words [0120] (5) determine the selection of

[0121]因为 C(Ws1) > C(Ws2),所以“装”取Ws1 =衣物(clothes)的词义。 [0121] Since C (Ws1)> C (Ws2), so the "loaded" take Ws1 = laundry (Clothes) of meaning.

[0122] 第三循环:只剩下包一个字。 [0122] Third cycle: only a packet word. 在以下的计算中,服和装只取衣物(clothes)的词义,而不再取其它的词义。 In the following calculations, clothing and devices take only meaning the laundry (Clothes), rather than taking the other meaning.

[0123]对 w =包:(c(w) = {叫,nJ = {服,装}),Senses (w) = (wsi; ws2)=(用具(tools),包扎(wrap)). [0123] w = packages of: (c (w) = {name, nJ = {clothing, loaded}), Senses (w) = (wsi; ws2) = (utensils (Tools), wrap (wrap)).

[0124] (1)计算相似度 [0124] (1) calculate the similarity

[0125] S(WSpn1) = max(S(用具(tools),衣物(clothes))) = max(0. 3) =0.3 [0125] S (WSpn1) = max (S (tool (tools), the laundry (clothes))) = max (0. 3) = 0.3

[0126] S(WSpn2) = max(S(用具(tools),衣物(clothes))) = max(0.3) =0.3 [0126] S (WSpn2) = max (S (tool (tools), the laundry (clothes))) = max (0.3) = 0.3

[0127] S(WSyn1) = max(S(包扎(wrap),衣物(clothes))) = max(0) =0 [0127] S (WSyn1) = max (S (wrap (wrap), laundry (clothes))) = max (0) = 0

[0128] S(ws2, n2) = max(S(包扎(wrap),衣物(clothes))) = max(0) =0[0129] (2)计算可信度 [0128] S (ws2, n2) = max (S (wrap (wrap), laundry (clothes))) = max (0) = 0 [0129] (2) calculates the reliability

[0130] C(Ws1) = S (Ws1, Ii1) / (S (Ws1, Ii1)+S (ws2, Ii1))+S (Ws1, n2) / (S (Ws1, n2)+S (ws2, n2))= 0. 3/(0. 3+0)+0. 3/(0. 3+0) = 2 [0130] C (Ws1) = S (Ws1, Ii1) / (S (Ws1, Ii1) + S (ws2, Ii1)) + S (Ws1, n2) / (S (Ws1, n2) + S (ws2, n2)) = 0. 3 / (0. 3 + 0) +0. 3 / (0. 3 + 0) = 2

[0131 ] C (ws2) = 0/ (0· 3+0) +0/ (0· 3+0) = 0 [0131] C (ws2) = 0 / (0 · 3 + 0) + 0 / (0 · 3 + 0) = 0

[0132] 因为只剩下一个词,所以可以省略计算词义明显度和选择词义明显度最大的词的步骤。 [0132] Since only the next word, it is possible to calculate the degree of obvious meaning and omit the step of selecting the largest of the significant meaning words. 在确定词义时,因为C(Ws1) > C(Ws2),所以包取Ws1 =用具(tools)的词义。 In determining the meaning as C (Ws1)> C (Ws2), so that the package takes the meaning Ws1 = appliance (Tools) of.

[0133] 最后输出结果为:{:服:衣物(clothes),装:衣物(clothes),包:用具(tools)}。 [0133] The final output is: {: service: laundry (clothes), mounted: laundry (clothes), package: tool (tools)}. 这是正确的结果,其中包和服、装的词义保持一致。 This is the correct result, which kimono bag, filled with meaning consistent.

[0134] 根据上述示例可以看出采用根据本发明的词义消歧方法在词义消歧的同时保持了词义的一致性。 [0134] According to the above example it can be seen that maintains consistency at the same time meaning WSD WSD method according to the present invention.

[0135] 此外,虽然上述词义消歧方法的结果保持了一致性,但是上述示例所采用的方法使用了三个循环,重复计算了某些内容,所以计算时间比参考文献1有所延长。 [0135] Further, although the result of the WSD remain consistent method, but the above-described exemplary method employed uses three cycles, repeated calculation of certain elements, so the calculation time is somewhat longer than that of Reference 1.

[0136] 为了减少计算时间、加快计算过程,本发明提出了对上述词义消歧方法的改进方法。 [0136] In order to reduce computation time, speed up the calculation process, the present invention provides an improved method of the above-described method of WSD. 其思路是(1)对所有词义明显度超过某一阈值的词,都在同一循环中确定词义。 The idea is that (1) are determined in the same meaning in all the cycles of significant meaning of the word exceeds a certain threshold. (2)对所有词按照词义明显度排序,取前η个词,在同一循环中确定词义。 (2) for all of the words in accordance with the sort of obvious meaning, [eta] before taking words, meaning in the same cycle is determined. 下面结合图2c和2d对这两种改进方法进行了描述。 Below in connection with FIGS. 2c and 2d are described improvements of these two methods.

[0137] 图2c示出了词义消歧方法的一个流程图。 [0137] FIG 2c shows a flowchart of a method of eliminating manifold meaning. 其中S401至S405与S201至S205的处理过程相同,这里省略对其描述。 Wherein the same as S401 to S405 and S201 to S205 of the processing, description thereof is omitted herein. 在S406,选词单元26选择词义明显度大于阈值的多义词,并确定选择的词的词义。 26 selects the meaning of polysemous word meaning significantly larger than the threshold value, and determines the selected word S406, selected word units. 如果某个词的词义明显度很高(高于阈值),则它取这个词义的可能性很大,即使在随后的循环中某些上下文的词义发生改变,这个词改变词义的可能性也不大,所以可以在第一循环中就确定该词的词义。 If the meaning of a word is very likely to significantly high degree (above the threshold), it takes this meaning, the meaning of certain contexts changed even in subsequent cycles, changing the meaning of the word is not the possibility of large, it is possible to determine the meaning of the word in the first cycle. 但因为阈值通常是设置的,结果中可能存在不一致。 However, because the threshold is usually set, there may be inconsistencies in results. 在S407,控制器28判断是否已经确定了所有多义词的词义,如果没有,则执行S403,否则结束处理。 In S407, the controller 28 determines whether all meaning has been determined polysemy, if not, perform S403, otherwise the process is terminated.

[0138] 下面结合“服装包”一词,对该方法进行简单说明。 [0138] below with the term "garment bag", the method will be briefly described.

[0139] 第一循环: [0139] First cycle:

[0140] 由于计算“服装包”的各个词的相似度和可信度同上,这里省略了描述。 [0140] Since the calculation "garment bag" in each word similarity and credibility above, description is omitted here.

[0141]计算词义明显度:E (服)=1.51,E (装)=0. 13,E (包)=0.41。 [0141] Calculation of obvious meaning: E ​​(clothing) = 1.51, E (loaded) = 0 13, E (package) = 0.41.

[0142] 选择词义明显度大于阈值的词:如果设置阈值T = 0. 5,则只有一个词满足条件: 服。 [0142] Select the word meaning of significantly greater than a threshold value: if the set threshold value T = 0. 5, only one word satisfying the condition: clothing.

[0143] 确定词义:确定服的含义为衣物(clothes)。 [0143] OK meaning: determining the meaning of the laundry clothes (clothes).

[0144] 第二循环: [0144] Second cycle:

[0145] 同样省略了对“装”和“包”的相似度和可信度的计算过程。 [0145] The same calculation is omitted on the "loading" and "package" similarity and reliability.

[0146] E(装)=E(包)=0.6。 [0146] E (means) = E (packets) = 0.6. 因为二者都大于T,所以选择这两个词决定词义。 Because both are greater than T, so we decided to choose two words meaning. 这里不再描述这一过程。 This process is not described here. 最后,“装”取衣物(clothes)的词义,“包”取用具(tools)的词义。 Finally meaning, "loaded" take the laundry (Clothes), the meaning of "package" take tool (Tools) of.

[0147] 最后输出结果为:{:服:衣物(clothes),装:衣物(clothes),包:用具(tools)}。 [0147] the final output is: {: service: laundry (clothes), mounted: laundry (clothes), package: tool (tools)}. 这是正确的结果。 This is the correct result. 该例子所采用的方法只用了两个循环就得到了正确的结果,所以节省了词义消歧系统的计算时间。 Examples of the method adopted only two cycles get the correct result, calculation time is saved WSD system.

[0148] 图2d示出了词义消歧方法的另一个流程图。 [0148] Figure 2d shows another flow chart of the method of word sense disambiguation. 其中S501至S505与S201至S205 的处理过程相同,这里省略对其描述。 Wherein the same as S501 to S505 and S201 to S205 of the processing, description thereof is omitted herein. [0149] 在S506,选词单元26根据词义明显度对多义词进行排序,并选择前η个词。 [0149] In S506, the choice of words sorting unit 26 according to the meaning of polysemous words conspicuity, η and select words before. 由于在这一步可以确定多个词的词义,所以可以节省一定的计算时间。 Since this step can determine multiple meanings of words, so we can save some computational time. 但是η也是设置的阈值, 可能引入不一致。 However, the threshold η is set, inconsistencies may be introduced.

[0150] 在S507,词义确定单元确定选择的词的词义。 [0150] In S507, the determining unit determines the meaning of the selected word meaning. 在S508,控制器28判断是否已经确定了所有多义词的词义,如果没有,则执行S503,否则结束处理。 In S508, the controller 28 determines whether all meaning has been determined polysemy, if not, perform S503, otherwise the process is terminated.

[0151] 仍以“服装包”为例,对该方法进行简单说明。 [0151] is still "clothing bag" for example, the method described briefly.

[0152] 第一循环: [0152] First cycle:

[0153] 由于计算“服装包”的各个词的相似度和可信度同上,这里省略对其描述。 [0153] Since the calculation "garment bag" in each word similarity and credibility above, description thereof is omitted herein.

[0154]计算词义明显度:Ε (服)=1.51,E (装)=0. 13,E (包)=0.41。 [0154] Calculation of obvious meaning:. Ε (clothing) = 1.51, E (loaded) = 0 13, E (package) = 0.41.

[0155] 排序结果:E(服)>Ε(&)>Ε(*)。 [0155] Sort Results: E (server)> Ε (&)> Ε (*). 如果设置η = 2,取前两个词确定词义。 If η = 2, two words is determined before removing meanings. 对“服”,因为C(Ws1) > C(Ws2),所以取Ws1 =衣物(clothes)的词义。 Of the "service", because it takes Ws1 = meaning laundry (Clothes) a C (Ws1)> C (Ws2). 对“包”,因为C (WS1) < C(Ws2),所以取Ws2 =包扎(wrap)的词义。 Of the "package", because C (WS1) <C (Ws2), so take Ws2 = wrap (wrap) of meaning.

[0156] 第二循环,只剩下一个“装”字。 [0156] Second cycle, only a "loaded" word.

[0157]对W=装:(c(w) = In1, n2} = {服,包}),Senses (w) = (wsi; ws2)=(衣物(clothes),包扎(wrap))。 [0157] W = means of: (c (w) = In1, n2} = {clothing, package}), Senses (w) = (wsi; ws2) = (laundry (Clothes), wrap (wrap)).

[0158] 计算相似度: [0158] similarity is calculated:

[0159] S(WSpn1) = max(S(衣物(clothes),衣物(clothes))) = max(l) = 1 [0159] S (WSpn1) = max (S (laundry (clothes), the laundry (clothes))) = max (l) = 1

[0160] S(WSpn2) = max(S(衣物(clothes),包扎(wrap))) = max(0) =0 [0160] S (WSpn2) = max (S (laundry (clothes), wrap (wrap))) = max (0) = 0

[0161] S(WSyn1) = max(S(包扎(wrap),衣物(clothes))) = max(0) =0 [0161] S (WSyn1) = max (S (wrap (wrap), laundry (clothes))) = max (0) = 0

[0162] S (ws2, n2) = max (S (包扎(wrap),包扎(wrap))) = max (1) =1 [0162] S (ws2, n2) = max (S (wrap (wrap), wrap (wrap))) = max (1) = 1

[0163] 计算可信度: [0163] reliability is calculated:

[0164] C(WS1) = C(WS2) = 1 [0164] C (WS1) = C (WS2) = 1

[0165] 因为C(WS1)和C(WS2)的可信度相同,所以可以任选一个,例如词义取“衣物(clothes),,。 [0165] Since the same C (WS1) and C (WS2) credibility, optionally can be a, for example, take the meaning "laundry (clothes) ,,.

[0166] 则最后输出为{:服:衣物(clothes),装:衣物(clothes),包:包扎(wrap)}。 [0166] Finally, the output {: service: laundry (clothes), mounted: laundry (clothes), package: wrap (wrap)}. 该例子所采用的方法只用了两个循环,节省了计算时间。 Examples of the method adopted only two cycles, saving computation time.

[0167] 图3a示出了根据本发明第二实施例的词义消歧系统。 [0167] Figure 3a shows a WSD system according to a second embodiment of the present invention. 与图2a所示的词义消歧系统相比较,该词义消歧系统还包括词义猜测单元38和词义获取单元39。 Compared with the WSD system shown in Figure 2a, the system further comprises WSD Guessing meaning acquisition unit 38 and unit 39. 词义猜测单元38 用于对词义未确定的多义词猜测可能的词义。 Meaning guessing unit 38 for an undetermined meaning polysemy possible to guess the meaning. 词义获取单元39用于判断猜测的可能词义是否与已确定词义一致并在一致时将猜测的可能词义确定为该多义词的词义。 Meaning acquisition unit 39 may be used to determine the meaning and determine whether speculation has been determined and consistent meaning when the agreement might guess meaning for the word meaning ambiguous. 通过采用词义猜测单元38和词义获取单元39,可以减少重复计算从而节省计算时间。 Meaning acquisition unit 38 and 39, repeated calculation can be reduced to save calculation time by using Guessing unit.

[0168] 下面结合图3b说明本发明第二实施例的系统执行的处理。 [0168] below with reference to FIG. 3b processing system of the second embodiment of the present invention is performed will be described. 图3b示出了根据本发明的词义消歧方法。 Figure 3b shows a WSD method according to the invention. 在S601,词义消歧系统的输入装置31输入文本。 Enter text input device S601, WSD system 31. 在S602,上下文确定装置32确定文本中的每个多义词的上下文。 In S602, the context determination means 32 determines a context of the text of each polysemous words. 在S603,词义消歧装置的相似度计算单元33 分别确定每个多义词的各个词义和上下文的相似度。 It was determined for each individual meaning and context of the polysemous word similarity in similarity calculating unit 33 S603, WSD device. 在S604,词义可信度计算单元34计算每个多义词的各个词义的可信度。 In S604, each unit 34 calculates the respective meaning of polysemous word meaning confidence reliability calculation. 在S605,词义明显度计算单元35计算每个多义词的词义明显度。 In S605, meaning significant calculation unit 35 calculates each Polysemons meaning significant degree. 可以使用S205所使用的方法来计算词义明显度。 Meaning significant degree may be calculated using the method used in S205. 在S606,选词单元36选择词义明显度最大的词,以及词义确定单元37确定该词的词义。 In S606, the choice of words of the unit 36 ​​selects the maximum significant meaning word meaning determination unit 37 and determines the meaning of the word.

[0169] 在S607,词义猜测单元38猜测其它词可能的词义。 [0169] In S607, Guessing unit 38 guess possible meaning other terms. [0170] 在S608,词义获取单元39选择猜测的词义与已确定词义一致的词,并将猜测的词义作为该词的词义。 [0170] In S608, the acquisition unit 39 selects guess the meaning of the determined meaning consistent with the meaning of the word, and guess the meaning of the term as meaning. 由于在词义确定单元38确定了一个词的词义之后,词义猜测单元38和猜测词义获取单元39交互操作以检查所有未确定词义的多义词其词义是否与已确定词义一致,如果一致,则在这一循环中将未确定词义确定为已确定词义,从而减少了计算时间。 Because after the meaning of the determination unit 38 determines the meaning of a word, meaning guessing unit 38 acquisition unit 39 interoperate and guess meaning to check that all polysemy undetermined meaning of its meaning is consistent with the determined meaning, if yes, in this It will be determined to have circular undetermined meaning word meaning determination, thereby reducing the calculation time.

[0171] 在S609,控制器40判断是否已经确定了所有词的词义。 [0171] In S609, the controller 40 determines whether all the identified meaning word. 如果否,则执行S603,否则结束处理。 If no, S603, otherwise the process is terminated.

[0172] 下面仍以“服装包”为例,简单说明上述方法。 [0172] Next, still "garment bag" for example, the method briefly described above.

[0173] 第一循环: [0173] First cycle:

[0174] 确定上下文,计算相似度和可信度与已有技术采用的方式相同,这里不再描述。 [0174] context is determined, in the same manner and the reliability of similarity calculation with the use of prior art will not be described herein. 并且确定了服务的词义:"w =服”取ws =衣物(clothes)的词义。 And determines the meaning and services: "w = service" take meaning ws = laundry (Clothes) a.

[0175] 猜测未确定词可能具有的词义: [0175] may have guessed undetermined word meaning:

[0176] (1)对A =装,Ws1 =衣物(clothes),ws2 =包扎(wrap).因为C (WS1) = 1. 06 > C (Ws2) = 0. 94,所以装取As = WS1. [0176] (1) A = installed, Ws1 = laundry (clothes), ws2 = wrap (wrap). Since C (WS1) = 1. 06> C (Ws2) = 0. 94, so that the pick up As = WS1 .

[0177] (2)对A =包,Ws1 =用具(tools),ws2 =包扎(wrap).因为C(wsl) = 0. 83 < C(ws2) = 1. 17,所以包取As = ws2. [0177] (2) A = packet, Ws1 = tool (tools), ws2 = wrap (wrap). Since C (wsl) = 0. 83 <C (ws2) = 1. 17, so that the package taken As = ws2 .

[0178] 判断未确定词的猜测词义是否与“服”的词义(ws =衣物(clothes)) —致,如果一致,则将猜测的词义作为该词的词义: [0178] Analyzing Guessing undetermined whether the word "service" in the meaning (ws = laundry (clothes)) - induced If they are consistent, then guess the meaning of the term as meaning:

[0179] 其中,对未确定词义词A,称它的某个词义As和词w的词义一致,当且仅当S (As, w) =S(As,ws)。 [0179] wherein the meaning of the word is not determined A, it is called a meaning consistent with the meaning of the word w As and, if and only if S (As, w) = S (As, ws). 其中ws是词w已经确定的词义。 Where ws is the meaning of the word w has been determined.

[0180] (1)对A=装,S (As,w) = max (S (衣物(clothes),衣物(clothes)), [0180] (1) A = installed, S (As, w) = max (S (laundry (clothes), the laundry (clothes)),

[0181] S (衣物(clothes),吃(eat))) = max (1,0) = 1。 [0181] S (laundry (clothes), eating (eat))) = max (1,0) = 1.

[0182]并且 S(As,ws) = S (衣物(clothes),衣物(clothes)) = 1。 [0182] and S (As, ws) = S (laundry (clothes), the laundry (clothes)) = 1.

[0183] 因为S(As,w) = S(As,ws),所以As和词w的词义一致。 [0183] Since S (As, w) = S (As, ws), so the same meaning of the word w and As.

[0184] (2)对A=包,S (As,w) = max (S (包扎(wrap),衣物(clothes)),S (包扎(wrap), 吃(eat))) = max(0,0. 2) = 0. 2。 [0184] (2) A = packet, S (As, w) = max (S (wrap (wrap), laundry (clothes)), S (wrap (wrap), eating (eat))) = max (0 , 0.2) = 0.2.

[0185]并且 S(As,ws) = S (包扎(wrap),衣物(clothes) = 0.。 [0185] and S (As, ws) = S (wrap (wrap), laundry (clothes) = 0 ..

[0186] 因为S(As,w)兴S(As,ws),所以As和词w的词义不一致。 [0186] Since S (As, w) Xing S (As, ws), meaning it is inconsistent with the As and word w.

[0187] 由于“装”符合要求,而包不符合。 [0187] As the "means" meets the requirements, but does not meet the pack. 所以确定“装”的词义,即衣物(clothes)。 Meaning it is determined "loaded", i.e., the laundry (clothes).

[0188] 所以,在这个循环结束后,有两个词确定了词义:服和装。 [0188] Therefore, at the end of this cycle, there are two words to determine the meaning of a word: clothing and devices.

[0189] 第二循环:只剩下“包” 一个词。 [0189] Second cycle: only a "package" of a word.

[0190]对 w =包:(c(w) = {叫,nJ = {服,装}),Senses (w) = (wsi; ws2)=(用具(tools),包扎(wrap)). [0190] w = packages of: (c (w) = {name, nJ = {clothing, loaded}), Senses (w) = (wsi; ws2) = (utensils (Tools), wrap (wrap)).

[0191] 计算相似度: [0191] similarity is calculated:

[0192] S(WSpn1) = max(S(用具(tools),衣物(clothes))) = max(0. 3) =0.3 [0192] S (WSpn1) = max (S (tool (tools), the laundry (clothes))) = max (0. 3) = 0.3

[0193] S(WSpn2) = max(S(用具(tools),衣物(clothes))) = max(0.3) =0.3 [0193] S (WSpn2) = max (S (tool (tools), the laundry (clothes))) = max (0.3) = 0.3

[0194] S(WSyn1) = max(S(包扎(wrap),衣物(clothes))) = max(0) =0 [0194] S (WSyn1) = max (S (wrap (wrap), laundry (clothes))) = max (0) = 0

[0195] S(ws2, n2) = max(S(包扎(wrap),衣物(clothes))) = max(0) =0 [0195] S (ws2, n2) = max (S (wrap (wrap), laundry (clothes))) = max (0) = 0

[0196] 计算可信度: [0196] reliability is calculated:

[0197] C(Ws1) = S (Ws1, Ii1) / (S (Ws1, Ii1)+S (ws2, Ii1))+S (Ws1, n2) / (S (Ws1, n2)+S (ws2, n2))=0. 3/(0. 3+0)+0. 3/(0. 3+0) = 2 [0197] C (Ws1) = S (Ws1, Ii1) / (S (Ws1, Ii1) + S (ws2, Ii1)) + S (Ws1, n2) / (S (Ws1, n2) + S (ws2, n2)) = 0. 3 / (0. 3 + 0) +0. 3 / (0. 3 + 0) = 2

[0198] C(Ws2) = 0/(0. 3+0)+0/(0. 3+0) = 0 [0198] C (Ws2) = 0 / (0. 3 + 0) + 0 / (0. 3 + 0) = 0

[0199] 因为只剩下一个词,可以直接判断该词的词义。 [0199] Because only one word, the meaning of the word can be directly determined. 因为C(Ws1) > C(Ws2),所以包取WS1 =用具(tools)的词义。 Since C (Ws1)> C (Ws2), so packet taking WS1 = meaning tool (Tools) of. 最后输出结果为:{:服:衣物(clothes),装:衣物(clothes), 包:用具(tools)}。 The final output is: {: service: laundry (clothes), mounted: laundry (clothes), package: tool (tools)}. 该结果消除了词义歧异的同时保持了文本中词义的一致性,而且减少了计算时间,加快了计算过程。 The result eliminates the discrepancy between the meaning of the text while maintaining the consistency of the meaning of a word, but also reduces the computation time, speed up the calculation process.

[0200] 虽然本发明以中文文本为例,说明了词义消歧的系统和方法,但是对于本领域技术人员,很明显地,本发明还可以应用于其它语言,例如,英文,日文。 [0200] Although an example of the present invention in Chinese text, the systems and methods described WSD, but for those skilled in the art, it is apparent that the present invention can also be applied to other languages, e.g., English, Japanese.

[0201] 尽管已经参照具体实施例,对本发明进行了描述,但本发明不应当由这些实施例来限定,而应当仅由所附权利要求来限定。 [0201] Although reference to specific embodiments, the present invention has been described, but the present invention should not be limited by these embodiments, but should be defined only by the appended claims. 应当清楚,在不偏离本发明的范围和精神的前提下,本领域普通技术人员可以对实施例进行改变或修改。 It should be apparent without departing from the scope and spirit of the present invention, those of ordinary skill in the art can change or modify the embodiments.

14 14

Claims (14)

  1. 一种词义消歧系统,用于对多义词进行词义消歧,包括:输入装置,用于输入包括多义词的文本;以及词义消歧装置,用于基于所述词的词义明显度来迭代地确定每个词的词义,其中词义明显度是根据所述词的词义可信度获得的。 WSD one kind of systems, for word sense disambiguation Polysemies, comprising: input means for inputting text comprising Polysemons; word sense disambiguation and means for iteratively determining for each word based on the meaning of significantly meaning of a word, according to which the meaning is obvious meaning of the word confidence obtained.
  2. 2.如权利要求1所述的系统,其中词义消歧装置包括:相似度计算单元,用于计算所述词的词义与其上下文之间的相似度; 词义可信度计算单元,用于基于获得的相似度计算所述词的词义可信度; 词义明显度计算单元,用于基于所述词的词义可信度获得所述词的词义明显度; 选词单元,用于根据词义明显度选择满足预定条件的词; 词义确定单元,用于确定所述选择词的词义;以及控制器,用于控制上述各个单元迭代地基于所述词的词义明显度确定每个词的词义。 Meaning reliability calculation means for obtaining basis; similarity calculating unit, for calculating a similarity between the word and its context meaning: 2. A system as claimed in claim 1, wherein the apparatus comprises WSD the similarity calculation of the meaning of the word confidence; meaning significant degree calculation means for obtaining the reliability of the word based on the meaning of the word meaning significant; term selection means for selecting the meaning according conspicuity word satisfies a predetermined condition; meaning determination means for determining the meaning of the selected word; and a controller for iteratively based on the meaning of the word significant meaning of each word is determined to control the respective units.
  3. 3.如权利要求1或2所述的系统,其中:词义明显度等于所述词的词义可信度中最大的值或者等于最大的词义可信度与次大的词义可信度之间的差与次大的词义可信度的比值。 3. The system of claim 1 or claim 2, wherein: the apparent meaning is equal to the meaning of the word in the reliability of the maximum value or equal to the maximum between a credibility and Semantic Meaning second largest reliability meaning the difference between the second largest confidence ratio.
  4. 4.如权利要求2所述的系统,其中选词单元选择词义明显度最大的词。 4. The system according to claim 2, wherein the term selection unit selects the maximum word meaning significant degree.
  5. 5.如权利要求2所述的系统,其中选词单元选择词义明显度大于一阈值的词。 5. The system according to claim 2, wherein the term selection unit selects a word meaning of significantly greater than a threshold value.
  6. 6.如权利要求2所述的系统,其中选词单元根据词义明显度对所述词进行排序并从中选择前η个词。 6. The system according to claim 2, wherein the term selection means sorting the words according to the apparent meaning of words and select η before.
  7. 7.如权利要求2所述的系统,其中还包括: 词义猜测单元,用于猜测词义未确定词的词义;词义获取单元,用于根据猜测的词义是否与已确定词义一致获取词义未确定词的词义;以及所述控制器控制上述各个单元迭代地基于所述词的词义明显度确定每个词的词义。 7. The system according to claim 2, further comprising: Guessing means for Guessing undetermined word meaning; meaning acquisition unit for acquiring are consistent with the meaning of the word is not determined according to the determined meaning of Guessing the meaning; and the controller controls the respective units based on the meaning of the word iteratively obvious meaning of each word is determined.
  8. 8.如权利要求1所述的系统,其中还包括:上下文确定装置,用于对所述输入文本中的词确定上下文。 8. The system according to claim 1, further comprising: a context determination means for determining a context of the input text word.
  9. 9. 一种词义消歧方法,用于对多义词进行词义消歧,包括: 输入步骤,输入包括多义词的文本;以及词义消歧步骤,基于所述词的词义明显度来迭代地确定每个词的词义,其中词义明显度是根据所述词的词义可信度获得的。 A method WSD, WSD for Polysemies, comprising: an input step of inputting includes text Polysemies; and each word sense disambiguation step, iteratively determined based on the meaning of the word conspicuity the meaning, according to which the meaning is obvious meaning of the word confidence obtained.
  10. 10.如权利要求9所述的方法,其中词义消歧步骤包括:相似度计算步骤,计算所述词的词义与其上下文之间的相似度; 词义可信度计算步骤,基于获得的相似度计算所述词的词义可信度; 词义明显度计算步骤,基于所述词的词义可信度获得所述词的词义明显度; 选词步骤,根据词义明显度选择满足预定条件的词; 词义确定步骤,确定所述选择的词的词义;以及重复上述各个步骤直到确定了每个词的词义。 10. The method according to claim 9, wherein the WSD step includes: a similarity calculation step of calculating a similarity between the word and its context meaning; meaning the reliability calculating step of calculating the similarity based on the obtained meaning the reliability of the word; meaning significant degree calculation step, the meaning of the word is obtained based on the reliability of the word meaning significant degree; term selection step, according to the meaning of significant word that satisfies a predetermined condition; determining meaning step of determining the meaning of the selected word; and repeating the above steps until it is determined the meaning of each word.
  11. 11.如权利要求9或10所述的方法,其中:词义明显度等于所述词的词义可信度中最大的值或者等于最大的词义可信度与次大的词义可信度之间的差与次大的词义可信度的比值。 Between the apparent meaning of the word confidence is equal to the meaning of the maximum value or equal to the maximum meaning credibility and reliability of the second largest meaning: 11. The method of claim 9 or claim 10, wherein meaning the difference between the second largest confidence ratio.
  12. 12.如权利要求10所述的方法,其中选词步骤根据下列方式之一选择满足预定条件的词:选择词义明显度最大的词;选择词义明显度大于阈值的词;以及根据词义明显度对所述词进行排序并从中选择前η个词。 The meaning of and conspicuity; select word meaning significant maximum degree; select a word meaning significantly greater than the threshold value: 12. A method as claimed in claim 10, wherein the term selection step of selecting one of the following ways according to the word satisfies a predetermined condition the words are sorted words and select η before.
  13. 13.如权利要求10所述的方法,其中还包括在词义确定步骤之后执行的步骤: 猜测词义未确定词的词义;以及根据猜测的词义是否与已确定词义一致获取词义未确定词的词义。 13. The method according to claim 10, further comprising the step of performing the step of determining the meaning of the following: guess the meaning of the word meaning undetermined; and whether the acquired meaning consistent with the meaning of the word is not determined according to the determined meaning of Guessing.
  14. 14.如权利要求9所述的方法,其中还包括: 上下文确定步骤,对所述输入文本中的词确定上下文。 14. The method according to claim 9, further comprising: a context determination step of determining a context of the input text word.
CN2009101417374A 2009-05-25 2009-05-25 Word meaning disambiguating system and method CN101901210A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN2009101417374A CN101901210A (en) 2009-05-25 2009-05-25 Word meaning disambiguating system and method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN2009101417374A CN101901210A (en) 2009-05-25 2009-05-25 Word meaning disambiguating system and method

Publications (1)

Publication Number Publication Date
CN101901210A true CN101901210A (en) 2010-12-01

Family

ID=43226753

Family Applications (1)

Application Number Title Priority Date Filing Date
CN2009101417374A CN101901210A (en) 2009-05-25 2009-05-25 Word meaning disambiguating system and method

Country Status (1)

Country Link
CN (1) CN101901210A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104160392A (en) * 2012-03-07 2014-11-19 三菱电机株式会社 Device, method, and program for estimating meaning of word
CN104731771A (en) * 2015-03-27 2015-06-24 大连理工大学 Term vector-based abbreviation ambiguity elimination system and method

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5541836A (en) * 1991-12-30 1996-07-30 At&T Corp. Word disambiguation apparatus and methods
WO2002010985A2 (en) * 2000-07-28 2002-02-07 Tenara Limited Method of and system for automatic document retrieval, categorization and processing
CN1871597A (en) * 2003-08-21 2006-11-29 伊迪利亚公司 System and method for associating documents with contextual advertisements
CN1916887A (en) * 2006-09-06 2007-02-21 哈尔滨工程大学 Method for eliminating ambiguity without directive word meaning based on technique of substitution words

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5541836A (en) * 1991-12-30 1996-07-30 At&T Corp. Word disambiguation apparatus and methods
WO2002010985A2 (en) * 2000-07-28 2002-02-07 Tenara Limited Method of and system for automatic document retrieval, categorization and processing
CN1871597A (en) * 2003-08-21 2006-11-29 伊迪利亚公司 System and method for associating documents with contextual advertisements
CN1916887A (en) * 2006-09-06 2007-02-21 哈尔滨工程大学 Method for eliminating ambiguity without directive word meaning based on technique of substitution words

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
DIANA MCCARTHY等: "Finding Predominant Word Senses in Untagged Text", 《ACL "04 PROCEEDINGS OF THE 42ND ANNUAL MEETING ON ASSOCIATION FOR COMPUTATIONAL LINGUISTICS》 *

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104160392A (en) * 2012-03-07 2014-11-19 三菱电机株式会社 Device, method, and program for estimating meaning of word
CN104160392B (en) * 2012-03-07 2017-03-08 三菱电机株式会社 Semantic estimation apparatus, method
CN104731771A (en) * 2015-03-27 2015-06-24 大连理工大学 Term vector-based abbreviation ambiguity elimination system and method

Similar Documents

Publication Publication Date Title
CN103201707B (en) Text to an electronic device for inputting text prediction engine, system and method
CN101802812B (en) Automatic context sensitive language correction and enhancement using an internet corpus
Brezzi et al. A priori error analysis of residual-free bubbles for advection-diffusion problems
CN100517301C (en) Systems and methods for improved spell checking
Toutanova et al. Learning random walk models for inducing word dependency distributions
US8898180B2 (en) Method and system for querying information
EP1856630A2 (en) Hybrid machine translation system
US20160162464A1 (en) Techniques for combining human and machine learning in natural language processing
JP2005122533A (en) Question-answering system and question-answering processing method
US20100138211A1 (en) Adaptive web mining of bilingual lexicon
CN104834747A (en) Short text classification method based on convolution neutral network
JP2012118977A (en) Method and system for machine-learning based optimization and customization of document similarity calculation
CN104615589A (en) Named-entity recognition model training method and named-entity recognition method and device
Raganato et al. Word sense disambiguation: A unified evaluation framework and empirical comparison
US9092483B2 (en) User query reformulation using random walks
Thomas et al. WBI-DDI: drug-drug interaction extraction using majority voting
CN103336766B (en) Short text garbage identification and modeling methods and apparatus
CN101169797B (en) Searching method
CN100595753C (en) Text subject recommending method and device
CN103558908A (en) Techniques for assisting a user in the textual input of names of entities to a user device in multiple different languages
US20140127647A1 (en) Concept noise reduction in deep question answering systems
CN101251862B (en) Content-based problem automatic classifying method and system
Munda et al. Parfm: parametric frailty models in R
CN104615767B (en) Training methods search ranking model, the search processing method and apparatus
Nguyen et al. AIDA-light: High-Throughput Named-Entity Disambiguation.

Legal Events

Date Code Title Description
C06 Publication
C10 Entry into substantive examination
C12 Rejection of a patent application after its publication