CN106339371A - English and Chinese word meaning mapping method and device based on word vectors - Google Patents

English and Chinese word meaning mapping method and device based on word vectors Download PDF

Info

Publication number
CN106339371A
CN106339371A CN201610765658.0A CN201610765658A CN106339371A CN 106339371 A CN106339371 A CN 106339371A CN 201610765658 A CN201610765658 A CN 201610765658A CN 106339371 A CN106339371 A CN 106339371A
Authority
CN
China
Prior art keywords
word
english
meaning
sentence
vector
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201610765658.0A
Other languages
Chinese (zh)
Other versions
CN106339371B (en
Inventor
鹿文鹏
孟凡擎
张玉腾
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shandong Jingweishengrui Data Technology Co ltd
Original Assignee
Qilu University of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Qilu University of Technology filed Critical Qilu University of Technology
Priority to CN201610765658.0A priority Critical patent/CN106339371B/en
Publication of CN106339371A publication Critical patent/CN106339371A/en
Application granted granted Critical
Publication of CN106339371B publication Critical patent/CN106339371B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/40Processing or translation of natural language
    • G06F40/58Use of machine translation, e.g. for multi-lingual retrieval, for server-side translation for client devices or for real-time translation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses an English and Chinese word meaning mapping method and device based on word vectors.The method comprises the steps of extracting a synonym set of to-be-mapped word meanings from an English knowledge base, and then querying candidate Chinese word meanings of all synonyms in the synonym set according to an English-Chinese dictionary; extracting English explanations and example sentences of to-be-mapped word meanings from the English knowledge base, and querying English explanations and example sentences of all of the candidate Chinese word meanings according to the English-Chinese dictionary; training the word vectors on an English language base, and respectively generating sentence vectors for all of the English explanations and example sentences; calculating the similarity of the sentence vectors of the English explanations and the example sentences of to-be-mapped word meanings with the sentence vectors of the English explanations and the example sentences of the candidate Chinese word meanings, and calculating the comprehensive similarity of the to-be-mapped word meanings with the candidate Chinese word meanings; selecting the candidate Chinese word meaning with the maximal comprehensive similarity as a target word meaning of the to-be-mapped word meanings. According to the method, the advantages of the word vectors can be effectively taken, and the word meanings can be accurately mapped.

Description

A kind of English-Chinese meaning of a word mapping method based on term vector and device
Technical field
The present invention relates to natural language processing technique field is and in particular to a kind of English-Chinese meaning of a word mapping side based on term vector Method and device.
Background technology
Meaning of a word mapping refers to the process of for the meaning of a word in knowledge base to be mapped as other Languages description by a kind of description of language.Word It is an important process of language basis Resources Construction in natural language processing field that benefit film showing is penetrated.As a basic task, It has material impact to the application such as word sense disambiguation, semantic analysis, machine translation.
Initially meaning of a word mappings work is mainly carried out in the way of manual, manually removes one by one the meaning of a word in mapping repositories.This The method of kind ensure that the accuracy of meaning of a word mapping, but because the meaning of a word concept in knowledge base is very abundant, data volume is huge, this The method of manual mapping takes time and effort it is difficult to complete.With the development of machine translation mothod, research worker is had to begin to use machine The method of translation carries out meaning of a word mapping, and machine translation system sent in the meaning of a word to be mapped, automatically exports meaning of a word mapping by this system Result.This method is automatically processed using machine translation mothod, time saving and energy saving, but the quality because of machine translation unreliable, word The accuracy that benefit film showing is penetrated is difficult to ensure that.No matter manual mapping method, or the mapping method of machine translation is it is clear that all can not meet The demand of the meaning of a word mapping of large-scale knowledge base.For these problems, the present invention proposes to be reflected based on the English-Chinese meaning of a word of term vector Shooting method and device, the method considers annotation and the example sentence information of the meaning of a word, and using term vector be annotation and example sentence generates sentence Subvector, then utilizes the similarity of the sentence vector Integrated comparative difference meaning of a word, judges the target Chinese of the English meaning of a word to be mapped The meaning of a word.The method can solve the deficiency of existing mapping method, improves the accuracy of meaning of a word mapping.
Content of the invention
The invention discloses a kind of English-Chinese meaning of a word mapping method based on term vector and device, to be more effectively carried out the meaning of a word Mapping.
For this reason, the present invention following technical scheme of offer:
A kind of English-Chinese meaning of a word mapping method based on term vector, comprises the following steps:
Step one, by English knowledge base extract the meaning of a word to be mapped synset, then according to English-Chinese dictionary inquire about synonymous Each synon candidate Chinese meaning of a word in word set;
Step 2, the english note by the English knowledge base extraction meaning of a word to be mapped and example sentence, and inquired about according to English-Chinese dictionary The english note of each candidate Chinese meaning of a word of step one gained and example sentence;
Step 3, train term vector on English corpus on a large scale, then for each english note of step 2 gained Generate sentence vector with example sentence respectively;
The sentence vector of step 4, the english note of the meaning of a word to be mapped of calculation procedure three gained and example sentence and candidate Chinese The similarity of the sentence vector of the english note of the meaning of a word and example sentence, then calculates the synthesis of the meaning of a word to be mapped and the candidate Chinese meaning of a word Similarity;
The maximum candidate Chinese meaning of a word of step 5, selection comprehensive similarity is as the target meaning of a word of the meaning of a word to be mapped.
Further, in described step one, when extracting synset and the query candidate Chinese meaning of a word, particularly as follows:
Step 1-1) by English knowledge base, extract the synset of the meaning of a word to be mapped;
Step 1-2) according to English-Chinese dictionary, inquire about each synon candidate Chinese meaning of a word in synset.
Further, in described step 2, when extracting english note and example sentence, particularly as follows:
Step 2-1) by English knowledge base, extract english note and the example sentence of the meaning of a word to be mapped;
Step 2-2) according to English-Chinese dictionary, query steps 1-2) english note of each candidate Chinese meaning of a word of gained and example Sentence.
Further, in described step 3, when training term vector and generating sentence vector, particularly as follows:
Step 3-1) train term vector on English corpus on a large scale;
Step 3-2) english note of step 2 gained and example sentence are carried out by lemmatization, extract the pretreatment such as notional word;
Step 3-3) according to step 3-1) term vector of gained, it is step 3-2) process the english note obtaining and example sentence divides Not Sheng Cheng sentence vector, particularly as follows:
English note or example sentence are denoted as s, a certain notional word in sentence is denoted as w, then the sentence vector of sentence sCan be by Formula (1) obtains;
s → = σ k = 1 k = | s | w k → - - - ( 1 )
Wherein, | s | represents the quantity of the notional word that sentence s comprises,Represent notional word wkTerm vector.
Further, in described step 4, when calculating acceptation similarity, particularly as follows:
Step 4-1) english note of the meaning of a word to be mapped of calculation procedure three gained and example sentence sentence vector and candidate in The similarity of the sentence vector of the english note of cliction justice and example sentence, particularly as follows:
English note or example sentence are denoted as s;
Any two sentence siAnd sjSentence vector similarity can be tried to achieve by formula (2);
s i m ( s i , s j ) = s i → · s j → | s i → | × | s j → | - - - ( 2 )
Wherein,WithRepresent sentence siAnd sjSentence vector,WithRepresent vectorWithMould.By formula (1) Substitute into formula (2), formula (3) can be obtained.
s i m ( s i , s j ) = σ k = 1 k = | s i | w k → · σ k = 1 k = | s j | w k → | σ k = 1 k = | s i | w k → | × | σ k = 1 k = | s j | w k → | - - - ( 3 )
In order that similarity score is between 0 to 1, in order to be compared to it afterwards, by the sentence in formula (3) to AmountUsing functionDo normalized, then formula (3) translates into formula (4);
s i m ( s i , s j ) = u ( σ k = 1 k = | s i | w k → ) · u ( σ k = 1 k = | s j | w k → ) | u ( σ k = 1 k = | s i | w k → ) | × | u ( σ k = 1 k = | s j | w k → ) | - - - ( 4 )
Wherein, functionNormalized, that is, refer to byIt is converted into unit vector.This process only changes vector magnitude Do not change direction, the cosine similarity not affecting vector calculates.
Step 4-2) by step 4-1) english note of gained and the sentence vector similarity of example sentence, calculate the meaning of a word to be mapped With the comprehensive similarity of the candidate Chinese meaning of a word, particularly as follows:
The meaning of a word to be mapped in English knowledge base is denoted as bs, a certain candidate Chinese meaning of a word is denoted as ds, it is comprehensive similar Degree can be calculated by formula (5);
s c o r e ( b s , d s ) = α s i m ( bs g l , ds g l ) + ( 1 - α ) m a x bs e x &element; bs e x s ds e x &element; ds e x s s i m ( bs e x , ds e x ) - - - ( 5 )
Wherein, bsglEnglish note for bs, dsglEnglish note for ds, bsexsEnglish example sentence for bs, dsexs English example sentence for ds, bsexFor bsexsIn an example sentence, dsexFor dsexsIn an example sentence, α and (1- α) are respectively Represent the weight of annotation and example sentence, sim (bsgl,dsgl) and sim (bsex,dsex) calculated by formula (4).
Further, in described step 5, select the maximum candidate Chinese meaning of a word of comprehensive similarity as the meaning of a word to be mapped The target meaning of a word when, particularly as follows:
The meaning of a word to be mapped in English knowledge base is denoted as bs, a certain candidate Chinese meaning of a word is denoted as ds, then bs maps Target meaning of a word ts can be obtained by formula (6);
t s = arg m a x ds i &element; d s s s c o r e ( b s , ds i ) - - - ( 6 )
Wherein, dss represents the set of the candidate Chinese meaning of a word of bs, dsiRepresent i-th candidate Chinese meaning of a word in dss, score(bs,dsi) can be calculated by formula (5) and try to achieve.
A kind of English-Chinese meaning of a word mapping device based on term vector, comprising:
Candidate's meaning of a word query unit, for extracting the synset of the meaning of a word to be mapped, then basis in English knowledge base Each synon candidate Chinese meaning of a word in English-Chinese dictionary inquiry synset;
Annotation and example sentence extraction unit, for extracting english note and the example sentence of the meaning of a word to be mapped in English knowledge base, and The english note of each candidate Chinese meaning of a word according to English-Chinese dictionary query candidate meaning of a word query unit gained and example sentence;
Sentence vector signal generating unit, for training term vector on English corpus on a large scale, then for annotation and example sentence Each english note of extraction unit gained and example sentence generate sentence vector respectively;
Acceptation similarity computing unit, for calculating the english note of the meaning of a word to be mapped of sentence vector signal generating unit gained With the similarity of the sentence vector of example sentence and the english note of the candidate Chinese meaning of a word and the sentence vector of example sentence, then calculate and wait to reflect Penetrate the comprehensive similarity of the meaning of a word and the candidate Chinese meaning of a word;
Target meaning of a word select unit, for selecting the maximum candidate Chinese meaning of a word of comprehensive similarity as the meaning of a word to be mapped The target meaning of a word.
Further, described candidate's meaning of a word query unit also includes:
Synset extraction unit, for extracting the synset of the meaning of a word to be mapped;
Candidate Chinese meaning of a word query unit, for inquiring about each synon candidate Chinese meaning of a word in synset;
Further, described annotation and example sentence extraction unit also include:
Word sense information extraction unit to be mapped, for extracting english note and the example sentence of the meaning of a word to be mapped;
Candidate's meaning of a word information extraction unit, for extracting each candidate Chinese word of candidate Chinese meaning of a word query unit gained The english note of justice and example sentence;
Further, described sentence vector signal generating unit also includes:
Term vector training unit, for training term vector on English corpus on a large scale;
Word sense information pretreatment unit, for carrying out word to annotation and the english note of example sentence extraction unit gained and example sentence The pretreatment such as shape reduction, extraction notional word;
Sentence vector signal generating unit, for being word sense information pretreatment unit according to term vector training unit gained term vector The english note obtaining and example sentence generate sentence vector respectively;
Further, described acceptation similarity computing unit also includes:
Sentence vector similarity computing unit, for calculating the English of the meaning of a word to be mapped of sentence vector signal generating unit gained The sentence vector of the annotation and example sentence similarity vectorial with the english note of the candidate Chinese meaning of a word and the sentence of example sentence;
Comprehensive similarity computing unit, the english note according to sentence vector similarity computing unit gained and the sentence of example sentence Subvector similarity, calculates the comprehensive similarity of the meaning of a word to be mapped and the candidate Chinese meaning of a word.
Beneficial effects of the present invention:
1st, the English-Chinese meaning of a word mapping method based on term vector proposed by the present invention and device, are a kind of full automatic words Benefit film showing shooting method, can avoid the loaded down with trivial details manual labor of traditional-handwork mapping method.
2nd, the English-Chinese meaning of a word mapping method based on term vector proposed by the present invention and device, have given full play to deep learning Advantage, generates sentence vector using term vector technology, being capable of the relatively accurately selection target meaning of a word, it is to avoid conventional machines translations The relatively low problem of the accuracy of mapping method.
3rd, the English-Chinese meaning of a word mapping method based on term vector proposed by the present invention and device, consider the meaning of a word annotation and Example sentence information, using the term vector technology of deep learning complete annotate and example sentence Similarity Measure, to both weighted sums with Calculate comprehensive similarity, thus the selection target meaning of a word, there is higher mapping accuracy.
4th, the English-Chinese meaning of a word mapping method based on term vector proposed by the present invention and device, when calculating sentence similarity, Only remain the notional word in sentence, the interference of unrelated function word in sentence can be avoided, improve meaning of a word mapping accuracy.
Brief description
Fig. 1 is the flow chart according to embodiment of the present invention based on the English-Chinese meaning of a word mapping method of term vector;
Fig. 2 is the structural representation according to embodiment of the present invention based on the English-Chinese meaning of a word mapping device of term vector;
Fig. 3 is the structural representation according to embodiment of the present invention meaning of a word query unit;
Fig. 4 is the structural representation according to embodiment of the present invention annotation and example sentence extraction unit;
Fig. 5 is the structural representation according to embodiment of the present invention sentence vector signal generating unit;
Fig. 6 is the structural representation according to embodiment of the present invention acceptation similarity computing unit;
Specific embodiment:
In order that those skilled in the art more fully understand the scheme of the embodiment of the present invention, below in conjunction with the accompanying drawings and implement Mode is described in further detail to inventive embodiments.
Babelnet is multi-lingual knowledge base, and it has built more complete English meaning of a word knowledge base at present, but its Chinese Knowledge concerning word sense storehouse is simultaneously incomplete, still lacks the automatic mapping that effective meaning of a word mapping method completes the English-Chinese meaning of a word at present.This patent Seek to a kind of English-Chinese meaning of a word mapping method based on term vector and device, solve similar English-Chinese meaning of a word mapping and ask Topic.
One meaning of a word " measure to be mapped is extracted by babelnet;mensurate;Measure_out ", its concrete language Justice description is as shown in table 1.The specific embodiment of the present invention, is described taking this meaning of a word as a example.
Table 1
The English-Chinese meaning of a word mapping method flow chart based on term vector for the embodiment of the present invention, as shown in figure 1, include following walking Suddenly.
Step 101, the query candidate meaning of a word.
Extracted the synset of the meaning of a word to be mapped by English knowledge base, then inquired about according to English-Chinese dictionary each in synset The individual synon candidate Chinese meaning of a word, particularly as follows:
Step 1-1) by English knowledge base, extract the synset of the meaning of a word to be mapped, process is as follows:
The present embodiment is directed to the English-Chinese meaning of a word mapping of babelnet and carries out, and the English knowledge base being adopted is Babelnet knowledge base.Similar with wordnet, the meaning of a word of babelnet is represented in the form of synset.As shown in Table 1, when The synset of the front meaning of a word to be mapped is { measure, mensurate, measure_out }.
Step 1-2) according to English-Chinese dictionary, inquire about each synon candidate Chinese meaning of a word in synset, process is as follows:
In the embodiment of the present invention, English-Chinese dictionary adopts Collins's high-order English-Chinese dictionary.Collins's high-order English-Chinese dictionary for Each meaning of a word all has careful English-Chinese description information, that simultaneously provides English-Chinese meaning of a word annotation and example sentence.In Collins's high-order English In Chinese dictionary, each English words have one or more corresponding Chinese the meaning of a word, each Chinese meaning of a word have an english note and One or more English-Chinese example sentences, these english informations are that the implementation of this patent provides resource support well.
In the embodiment of the present invention, according to Collins's high-order English-Chinese dictionary, to synset measure, mensurate, Measure_out } in each synonym inquiry obtain the candidate Chinese meaning of a word, as shown in table 2.
Table 2
Numbering Chinese word sense describes
1 Weigh;Estimate;Assessment;Judge
2 Measurement;Tolerance;Metering
3 Apart from (or length, width, quantity etc.) it is ...
4 (by required) measures, and measures
Step 102, extracts annotation and example sentence.
Extract english note and the example sentence of the meaning of a word to be mapped by English knowledge base, and according to English-Chinese dictionary query steps 101 The english note of each candidate Chinese meaning of a word of gained and example sentence, particularly as follows:
Step 2-1) by English knowledge base, extract english note and the example sentence of the meaning of a word to be mapped, process is as follows:
In the embodiment of the present invention, according to English knowledge base babelnet, the english note and example sentence of the meaning of a word to be mapped is entered Row extracts.From table 1 information, the english note of the meaning of a word to be mapped and example sentence information are as shown in table 3.
Table 3
Step 2-2) according to English-Chinese dictionary, query steps 1-2) english note of each candidate Chinese meaning of a word of gained and example Sentence, process is as follows:
In the embodiment of the present invention, according to Collins's high-order English-Chinese dictionary, extraction step 1-2 successively) institute's call number be 1,2, 3rd, 4 english note of each candidate Chinese meaning of a word and example sentence, as shown in table 4.For ease of understanding, in table 4, give phase simultaneously The Chinese meaning of a word answered.
Table 4
Step 103, generates sentence vector.
Train term vector, then each english note for step 102 gained and example sentence on English corpus on a large scale Generate sentence vector respectively, particularly as follows:
Step 3-1) train term vector on English corpus on a large scale.
The embodiment of the present invention is carried in the University of Pennsylvania using the term vector instrument word2vec toolkit of google For the 5th edition english gigaword data set on train term vector, vector dimension be 200, other training parameters etc. set Put all using default value.English gigaword is an English newsletter archive packet, and it covers seven kinds of different English Civilian world news source, 9876086 documents altogether, common 26348mb, can be consumed by the linguisticss collaboration data of the University of Pennsylvania Expense time several years arranges and forms.
Step 3-2) english note of step 102 gained and example sentence are carried out by lemmatization, extract the pretreatment such as notional word;
In the embodiment of the present invention, the stanford corenlp toolkit using Stanford University is carried out to english sentence Lemmatization, then extracts notional word.Its concrete processing procedure, taking the annotation process of the meaning of a word to be mapped as a example illustrates.
Annotation " determine the measurements of something or first to the meaning of a word to be mapped Somebody, take measurements of " carries out lemmatization, can obtain " determine the measurement of something or somebody,take measurement of”;Then, therefrom extract notional word, " determine can be obtained measurement something somebody take measurement”.
Step 3-3) by step 3-1) term vector of gained, to step 3-2) process the english note obtaining and example sentence respectively Generate sentence vector, particularly as follows:
English note or example sentence are denoted as s, a certain notional word in sentence is denoted as w, then the sentence vector of sentence sCan be by Formula (1) obtains;
s → = σ k = 1 k = | s | w k → - - - ( 1 )
Wherein, | s | represents the quantity of the notional word that sentence s comprises,Represent notional word wkTerm vector.
In present example, with step 3-2) in the meaning of a word to be mapped annotation " the determine measurement that obtains As a example the process of something somebody take measurement ", the generation method of declarative sentence vector.
First, by step 3-1) term vector trained, extracts the term vector of each notional word in sentence.Such as, determine Term vector be: [- 0.060966704, -0.06865787, -0.13976261,0.052583452,0.02309357, - 0.015850635,0.0057524024,0.004298664,0.07135361,-0.004907789,-0.0073844297,- 0.0660588,-0.09741554,-0.0826721,0.0020558392,0.0019447851,-0.044812344, 0.1433886,0.107519455,-0.013067925,0.055411655,0.098691314,-0.11813014, 0.028893137,-0.10136866,0.024213811,-0.021338113,-0.006830832,-0.01115726, 0.023671253,0.022735655,-0.106075086,-0.0060708467,-0.06795107,-0.024008093,- 0.10278628,0.110742025,0.06967174,-0.026281023,-0.1304829,-0.18443915,- 0.01603829,0.024118813,-0.02448944,0.08606661,0.04368876,-0.027071448, 0.06927168,-0.16086423,-0.09339183,0.048664782,-0.0037259995,-0.19597004,- 0.05804217,-0.042547442,-0.105807476,0.013699462,0.09974968,-0.038489617,- 0.0507417,0.08751733,0.03520148,0.062430475,0.011540262,-0.12392134, 0.10225074,-0.04389849,-0.053057443,-0.014595923,0.15838726,-0.036213677,- 0.022729969,0.12135271,0.053754877,0.0653142,-0.11217302,-0.032784045,- 0.02645095,-0.0058537563,-0.037233904,-0.091778874,-0.017529158,0.03335303,- 0.11941094,0.12519278,0.045954995,-0.07207713,-0.040876612,-0.093257025, 0.06504259,0.005461387,0.06069275,0.030098341,-0.007988872,-0.027645452,- 0.032660615,-0.062259212,-0.020880515,0.076618314,0.046356063,-0.07308063, 0.03509143,-0.08876938,-0.02635127,-0.012593604,0.14288785,0.045763995,- 0.024156947,0.04318199,-0.012540084,-0.10338905,-0.031343687,-0.04143757,- 0.024850031,0.12515464,0.13902804,0.045706462,0.094424434,0.06911446,- 0.042245053,-0.01119372,0.07074649,-0.06615113,0.059482194,0.06079544,- 0.0073646945,0.05371373,0.07749403,0.09774167,0.04614667,0.080500856, 0.06686461,-0.1371806,0.059351735,-0.11971834,-0.024769751,0.005559396,- 0.004569609,0.025109604,-0.010085186,0.06588754,-0.021475257,-0.12877394,- 0.011472024,0.019178912,-0.022502841,0.049072206,-0.07339941,-0.06519345,- 0.023635125,0.05878342,-0.041036837,0.016565796,0.13539337,-0.024638291,- 0.08239346,-0.00374239,0.0033550384,0.01374094,0.0065936707,-0.030307738, 0.009063287,-0.021692682,-0.09899706,0.04887318,0.037609883,-0.045150857,- 0.09769283,-0.06568951,-0.13722141,0.018394174,0.03404645,-0.08603616,- 0.07023705,0.14471957,-0.059314273,0.0674724,-0.07376034,0.041695137,- 0.03897431,-0.12877795,-0.057006553,-0.018086433,0.022128537,-0.08181979,- 0.08615692,0.029183147,-0.090377316,0.069178686,-0.015696429,-0.0043464974, 0.0035500522,0.1526469,0.09442544,0.012619695,0.09376681,0.06574002, 0.032735877,-0.06054757,0.031108197].The term vector of measurement is: [- 0.030921048, 0.040468287,0.07367502,-0.036431145,0.09001577,-0.10851831,0.031571753,- 0.0076946556,-0.025466012,0.08239048,-0.033852145,0.023865981,-0.06640976, 0.09898748,-0.060916066,-0.12299272,-0.10123717,0.018511012,-0.017379025, 0.11183538,-0.032644443,0.061155915,-0.046167403,-0.02107625,-0.054799207,- 0.003215416,-0.022842003,-0.07484936,-0.016040549,-6.718859e-4,0.09849985, 0.10686533,-0.027949711,-0.014089485,0.08666428,-0.055681817,0.12596299,- 0.081768885,-0.023240687,-0.040215734,0.009278273,-0.072330184,0.011064145,- 0.046390835,0.009363516,0.07663736,-0.046891708,0.120461896,-0.024577046,- 0.065430254,-0.060996015,-0.031411856,-0.024597166,-0.022857357,- 0.019988738,-0.02650852,-0.046675686,-0.072701864,-0.06415478,-0.012159599,- 0.019452924,-0.007099012,-0.035306044,-0.046926122,-0.060533796,-0.069201075, 0.029004399,-0.024853425,-0.08013603,-0.040774312,0.10615162,0.036688466, 0.0055641048,-0.005188717,0.0027881414,0.061590068,-0.057311498,- 0.0018721737,0.032288115,-0.12578985,-0.1902009,-0.056136098,-4.728086e-4,- 0.061017197,0.04288104,0.01388723,-0.038211193,-0.043795947,-0.04814441, 0.1526314,0.033593766,0.078088604,0.005799715,0.03464157,-0.0035865682,- 0.20270306,-0.111725785,-0.09797781,-0.09489581,-0.054468293,-0.0015290832,- 0.16072103,0.056969997,0.013535669,-0.17215633,0.20882045,0.04354922,- 0.0025980647,0.08676594,0.0429361,0.029175945,-0.039518964,0.03309713, 0.027989952,-0.029852066,0.028658131,0.037572138,-0.064470336,0.0275685,- 0.094821155,0.14544079,-0.049508303,0.05595343,0.04108511,0.022339016,- 0.007031241,0.06387787,-0.051717743,0.035961512,0.0034367307,0.073031195,- 0.097252965,-0.060861535,0.12593704,-0.024983672,0.07234978,-0.04727927,- 0.19234574,0.11479137,0.013784515,-0.012358148,0.02151782,0.014949858, 0.03911975,-0.01054792,-0.07922059,0.036444385,0.025766745,-0.12601435, 0.047032543,-0.02278641,-0.13189878,0.111353576,-0.06969082,0.020863937, 0.01676644,0.009361927,0.039854113,-0.060249478,0.027769696,-0.27008596, 0.05944734,0.039832402,-0.026858494,-0.020013094,0.025406713,-2.128433e-4,- 0.05612445,0.04703572,-0.024139712,0.06555838,0.07517604,0.09585466,- 0.005991909,-0.0397101,-0.042226095,0.06041255,0.02176508,-0.027269356,- 0.038427215,-0.09381253,0.22008736,0.105541155,0.071456574,-0.016034195, 0.02069451,0.017009461,-0.07982682,-0.010532036,0.08931265,0.042708967, 0.018712737,-0.07463705,0.052128073,0.06920637,0.022202944,0.022940483, 0.05133759,-0.038717363,-0.013162929].In the same manner, obtain the term vector of each notional word in sentence one by one.
Then, by formula (1) by sentence each notional word term vector be added, can this meaning of a word annotation sentence vector For:
[-0.12244331,0.23284505,-0.125848,-0.09857595,0.15176383,- 0.21165508,-0.06935414,0.17774323,-0.0481385,0.27167976,-0.23219745,- 0.31177434,-0.237795,0.20023781,-0.2208232,-0.25496095,-0.050965287, 0.19869018,0.14223932,0.054064974,0.14445543,0.3649017,-0.06972199,- 0.0942207,-0.4732177,-0.002447103,-0.11354132,-0.23180336,-0.032030072, 0.11646948,0.068802774,0.24477573,0.074090265,-0.30747676,0.28410295,- 0.3153889,0.48259473,0.0018074736,-0.2570166,-0.065705955,-0.29293522, 0.1187244,0.08923024,-0.023698367,0.078454815,0.2028578,-0.36501467, 0.40085053,-0.0051737167,-0.25175425,-0.11989543,-0.09693016,-0.095989406, 0.0065662824,0.01091335,-0.03598065,-0.12002948,-0.10372059,-0.28191066, 0.033649035,0.3604529,-0.047989205,-0.1641263,-0.21081169,-0.13621823, 0.33522972,-0.050793078,-0.0373758,-0.22907057,0.109199345,0.37030825,- 0.11889391,0.24283075,0.07673705,0.318008,-0.22766817,-0.42850304,- 0.071055345,0.1914971,-0.28046763,-0.6080315,-0.017843004,0.2313133,- 0.2477001,0.26103482,0.14874645,-0.09291037,-0.0409794,-0.23852225, 0.41014478,-0.17998967,0.31087965,0.11493398,-0.0023042597,-0.09591526,- 0.28730935,-0.49623907,-0.30990297,-0.22764425,-0.06879938,-6.009942e-4,- 0.25748277,0.00649539,0.21129256,-0.4945098,0.82365096,0.3147551, 0.0121324705,0.29460865,-0.13176502,-0.1077477,-0.19233456,0.08242655, 0.16084583,-0.13618916,0.11765827,0.23201033,-0.14476305,0.3566257,- 0.33154497,0.32010967,0.017003909,0.0983599,0.28363377,0.17411232,- 0.31067532,0.21472177,-0.18492793,0.09781431,0.060426474,0.3050918,- 0.12334619,-0.23786914,0.27095866,0.023499401,-0.07610657,-0.0463394,- 0.48189855,0.44204056,-0.030785767,0.046995677,-0.11442133,-0.32249418,- 0.13742244,-0.1368755,-0.21778521,0.061512135,-0.31345803,-0.19940937, 0.09265008,-0.02924196,-0.15277626,0.30612707,0.41078234,0.099931955,- 0.14431237,0.16773543,-0.14954714,-0.044322092,-0.020516273,-0.52509534, 0.10045516,0.13150021,-0.1684227,0.059403583,0.3293987,0.24298555,- 0.3315874,-0.057996165,-0.34279677,0.24292094,0.2758336,-0.16648525,- 0.13480023,-0.18450123,-0.1112635,0.15073343,0.20073035,-0.097931616,- 0.2827055,-0.24364212,0.17794128,0.35367286,-0.012077071,-0.17940772, 0.08209381,0.08326046,-0.12982222,0.35156035,0.11034558,-0.0971424, 0.01952859,-0.070994884,0.22338426,0.10498668,-0.22422943,-0.04826733, 0.046616875,-0.326965,0.05593993].
In the same manner, the sentence vector corresponding to each english note and example sentence can be obtained.
Step 104, calculates acceptation similarity.
Sentence vector and the candidate Chinese meaning of a word of the english note of the meaning of a word to be mapped of calculation procedure 103 gained and example sentence The similarity of the sentence vector of english note and example sentence, then calculates the meaning of a word to be mapped similar to the synthesis of the candidate Chinese meaning of a word Degree, particularly as follows:
Step 4-1) english note of the meaning of a word to be mapped of calculation procedure 103 gained and example sentence sentence vector and candidate in The similarity of the sentence vector of the english note of cliction justice and example sentence, particularly as follows:
English note or example sentence are denoted as s;
Any two sentence siAnd sjSentence vector similarity can be tried to achieve by formula (2);
s i m ( s i , s j ) = s i → · s j → | s i → | × | s j → | - - - ( 2 )
Wherein,WithRepresent sentence siAnd sjSentence vector,WithRepresent vectorWithMould.By formula (1) Substitute into formula (2), formula (3) can be obtained.
s i m ( s i , s j ) = σ k = 1 k = | s i | w k → · σ k = 1 k = | s j | w k → | σ k = 1 k = | s i | w k → | × | σ k = 1 k = | s j | w k → | - - - ( 3 )
In order that similarity score is between 0 to 1, in order to be compared to it afterwards, by the sentence in formula (3) to AmountUsing functionDo normalized, then formula (3) translates into formula (4);
s i m ( s i , s j ) = u ( σ k = 1 k = | s i | w k → ) · u ( σ k = 1 k = | s j | w k → ) | u ( σ k = 1 k = | s i | w k → ) | × | u ( σ k = 1 k = | s j | w k → ) | - - - ( 4 )
Wherein, functionNormalized, that is, refer to byIt is converted into unit vector.This process only changes vector magnitude Do not change direction, the cosine similarity not affecting vector calculates.
In the embodiment of the present invention, for the similarity calculating two sentence vectors, to calculate meaning of a word annotation to be mapped " determine the measurements of something or somebody, take measurements of " with English note " if you measure the quality, value, the or of the candidate Chinese meaning of a word being 1 is numbered in table 4 The sentence unit vector of effect of something, you discover or judge how great it is. " is similar As a example degree.
First, the normalized of distich subvector, at the sentence vector of meaning of a word annotation to be mapped to step 103 gained As a example reason.
To step 103 gained, the meaning of a word to be mapped annotates " determine the measurements of something The sentence vector of or somebody, take measurements of "Carry out the conversion of unit vector, obtain vectorList Bit vector is,
[-0.03826203,0.072761215,-0.03932595,-0.030803772,0.047424328,- 0.06613961,-0.021672316,0.055542573,-0.015042689,0.08489659,-0.07255885,- 0.09742565,-0.07430801,0.06257185,-0.069004536,-0.079672165,-0.015926026, 0.062088236,0.044448037,0.016894639,0.045140546,0.11402729,-0.021787263,- 0.02944281,-0.14787471,-7.646896e-4,-0.035480265,-0.0724357,-0.010009004, 0.03639528,0.021500021,0.076489404,0.023152297,-0.0960827,0.088778675,- 0.09855515,0.15080492,5.6481326e-4,-0.080314524,-0.020532303,-0.09153865, 0.037099913,0.027883353,-0.0074054482,0.024516165,0.06339057,-0.11406259, 0.12526086,-0.0016167228,-0.07867011,-0.037465848,-0.030289482,-0.029995508, 0.0020518824,0.0034102874,-0.01124351,-0.037507735,-0.032411408,-0.088093616, 0.010514909,0.112637095,-0.014996036,-0.051287454,-0.06587606,-0.042566523, 0.10475516,-0.015872212,-0.011679478,-0.071581736,0.03412345,0.11571677,- 0.037152883,0.07588162,0.023979386,0.099373594,-0.0711435,-0.13390192,- 0.022203922,0.05984049,-0.087642685,-0.19000237,-0.0055757193,0.07228256,- 0.07740323,0.08157017,0.046481434,-0.029033348,-0.012805559,-0.074535266, 0.1281652,-0.056244556,0.097146064,0.035915457,-7.200528e-4,-0.029972339,- 0.089780636,-0.1550686,-0.096840866,-0.07113603,-0.02149896,-1.878033e-4,- 0.0804602,0.0020297295,0.06602633,-0.15452823,0.25738078,0.098357104, 0.0037912477,0.09206158,-0.04117495,-0.03366983,-0.060102183,0.025757283, 0.050262343,-0.042557437,0.036766764,0.07250038,-0.045236673,0.11144115,- 0.10360372,0.10003033,0.0053135124,0.030736258,0.08863206,0.054407958,- 0.09708222,0.06709791,-0.0577877,0.030565768,0.01888253,0.095337436,- 0.038544167,-0.07433118,0.08467125,0.007343274,-0.023782367,-0.014480493,- 0.15058737,0.13813224,-0.009620174,0.014685571,-0.035755258,-0.100775465,- 0.042942822,-0.04277191,-0.0680552,0.019221786,-0.09795178,-0.062312976, 0.02895201,-0.009137753,-0.0477407,0.09566095,0.12836443,0.031227507,- 0.04509584,0.05241526,-0.046731643,-0.013850109,-0.006411083,-0.16408584, 0.031391002,0.0410922,-0.052630022,0.018562889,0.10293304,0.07592999,- 0.10361698,-0.018123088,-0.10711978,0.0759098,0.08619461,-0.05202459,- 0.04212341,-0.057654366,-0.034768473,0.047102343,0.06272577,-0.030602425,- 0.08834199,-0.076135166,0.05560446,0.11051842,-0.003773936,-0.056062706, 0.025653306,0.02601787,-0.04056785,0.10985829,0.034481637,-0.030355806, 0.006102444,-0.022185028,0.06980483,0.03280705,-0.07006894,-0.015082947, 0.0145672,-0.10217254,0.017480541].
In the same manner, can get the unit vector of each annotation other and example sentence sentence vector.
Number the english note of the candidate Chinese meaning of a word being 1 phase between the two in the meaning of a word to be mapped annotation and table 4 Can be tried to achieve by formula (4) like degree, calculating and trying to achieve this similarity is 0.3879761.
In the same manner, meaning of a word english note to be mapped can be calculated successively and the English numbering the candidate Chinese meaning of a word being 2,3,4 Annotation similarity, respectively 0.4196734,0.3625376,0.41536587.
In the same manner, the similarity of the example sentence of the meaning of a word to be mapped and the example sentence of the candidate Chinese meaning of a word can be calculated successively, as table 5 institute Show.In table 5, meaning of a word only one of which example sentence to be mapped, it is numbered is ex;The of first meaning of a word of the example sentence of the candidate Chinese meaning of a word The numbering of one example sentence is 1_ex1, and the numbering of second example sentence of its first meaning of a word is 1_ex2, each example of other each meaning of a word The numbering of sentence is by that analogy.
Table 5
Meaning of a word illustrative sentence numbers to be mapped Candidate's meaning of a word illustrative sentence numbers Example sentence similarity
ex 1_ex1 0.33322173
ex 1_ex2 0.3466332
ex 1_ex3 0.34800234
ex 2_ex1 0.7905501
ex 2_ex2 0.40629613
ex 3_ex1 0.5284378
ex 3_ex2 0.5624604
ex 3_ex2 0.5684977
ex 4_ex1 0.35761255
ex 4_ex2 0.3466332
Step 4-2) by step 4-1) english note of gained and the sentence vector similarity of example sentence, calculate the meaning of a word to be mapped With the comprehensive similarity of the candidate Chinese meaning of a word, particularly as follows:
The meaning of a word to be mapped in English knowledge base is denoted as bs, a certain candidate Chinese meaning of a word is denoted as ds, it is comprehensive similar Degree can be calculated by formula (5);
s c o r e ( b s , d s ) = α s i m ( bs g l , ds g l ) + ( 1 - α ) m a x bs e x &element; bs e x s ds e x &element; ds e x s s i m ( bs e x , ds e x ) - - - ( 5 )
Wherein, bsglEnglish note for bs, dsglEnglish note for ds, bsexsEnglish example sentence for bs, dsexs English example sentence for ds, bsexFor bsexsIn an example sentence, dsexFor dsexsIn an example sentence, α and (1- α) are respectively Represent the weight of annotation and example sentence, sim (bsgl,dsgl) and sim (bsex,dsex) calculated by formula (4).
In the embodiment of the present invention, the comprehensive similarity of meaning of a word bs to be mapped and a certain candidate Chinese meaning of a word ds is calculated, with As a example numbering the comprehensive similarity calculating between the candidate Chinese meaning of a word being 1 in the meaning of a word to be mapped in table 1 and table 4.
By known steps 4-1) number the English of the candidate Chinese meaning of a word being 1 in the gained meaning of a word to be mapped english note and table 4 Literary composition annotation similarity, sim (bsgl,dsgl)=0.3879761.In formula (5)Represent Take a certain bsexWith a certain dsexBetween similarity the maximum, by step 4-1) the gained meaning of a word to be mapped English example sentence and numbering Each example sentence similarity of the candidate Chinese meaning of a word for 1 is respectively 0.33322173,0.3466332,0.34800234, wherein 0.34800234 value is maximum, thereforeThrough lot of experiment validation, this Weight in formula (5) is set to 0.4 by inventive embodiments.Can be obtained by formula (5), meaning of a word bs to be mapped with number be 1 time Choose comprehensive similarity score (bs, ds)=0.4 × 0.3879761+ (the 1-0.4) × 0.34800234=of cliction justice ds 0.3480023443698883.
In the same manner, the comprehensive similarity of the meaning of a word to be mapped and other each candidate Chinese meaning of a word in table 4 can be obtained, as shown in table 6.
Table 6
Step 105, according to the acceptation similarity selection target meaning of a word.
Select the maximum candidate Chinese meaning of a word of comprehensive similarity as the meaning of a word to be mapped the target meaning of a word when, particularly as follows:
The meaning of a word to be mapped in English knowledge base is denoted as bs, a certain candidate Chinese meaning of a word is denoted as ds, then bs maps Target meaning of a word ts can be obtained by formula (6);
t s = arg m a x ds i &element; d s s s c o r e ( b s , ds i ) - - - ( 6 )
Wherein, dss represents the set of the candidate Chinese meaning of a word of bs, dsiRepresent i-th candidate Chinese meaning of a word in dss, score(bs,dsi) can be calculated by formula (5) and try to achieve.
In present example, as shown in Table 6, numbering be the 2 candidate Chinese meaning of a word meaning of a word comprehensive similarity score Height, so this meaning of a word is using by the target word justice mapping result as the meaning of a word to be mapped.
By above operating procedure, you can complete the meaning of a word mappings work of the meaning of a word to be mapped.
Correspondingly, the embodiment of the present invention also provides a kind of English-Chinese meaning of a word mapping device based on term vector, its structural representation Figure is as shown in Figure 2.
In this embodiment, described device includes:
Candidate's meaning of a word query unit 201, for extracting the synset of the meaning of a word to be mapped, then root in English knowledge base According to each synon candidate Chinese meaning of a word in English-Chinese dictionary inquiry synset;
Annotation and example sentence extraction unit 202, for extracting english note and the example of the meaning of a word to be mapped in English knowledge base Sentence, and the english note according to each candidate Chinese meaning of a word of English-Chinese dictionary query candidate meaning of a word query unit gained and example sentence;
Sentence vector signal generating unit 203, for training term vector on English corpus on a large scale, then for annotation and example Each english note of sentence extraction unit gained and example sentence generate sentence vector respectively;
Acceptation similarity computing unit 204, for calculating the English of the meaning of a word to be mapped of sentence vector signal generating unit gained The sentence vector of annotation and example sentence and the similarity of the english note of the candidate Chinese meaning of a word and the sentence vector of example sentence, then calculate The meaning of a word to be mapped and the comprehensive similarity of the candidate Chinese meaning of a word;
Target meaning of a word select unit 205, for selecting the maximum candidate Chinese meaning of a word of comprehensive similarity as word to be mapped The target meaning of a word of justice.
The structural representation of candidate's meaning of a word query unit 201 of Fig. 2 shown device as shown in figure 3, comprising:
Synset extraction unit 301, for extracting the synset of the meaning of a word to be mapped;
Candidate Chinese meaning of a word query unit 302, for inquiring about each synon candidate Chinese meaning of a word in synset.
The structural representation of the annotation of Fig. 2 shown device and example sentence extraction unit 202 as shown in figure 4, comprising:
Word sense information extraction unit 401 to be mapped, for extracting english note and the example sentence of the meaning of a word to be mapped;
Candidate's meaning of a word information extraction unit 402, for extracting in each candidate of candidate Chinese meaning of a word query unit gained The english note of cliction justice and example sentence.
Fig. 2 shown device sentence vector signal generating unit 203 structural representation as shown in figure 5, comprising:
Term vector training unit 501, for training term vector on English corpus on a large scale;
Word sense information pretreatment unit 502, for entering to annotation and the english note of example sentence extraction unit gained and example sentence The pretreatment such as row lemmatization, extraction notional word;
Sentence vector signal generating unit 503, for being word sense information pretreatment according to term vector training unit gained term vector The english note that cell processing obtains and example sentence generate sentence vector respectively.
The structural representation of the acceptation similarity computing unit 204 of Fig. 3 shown device as shown in fig. 6, comprising:
Sentence vector similarity computing unit 601, for calculating the meaning of a word to be mapped of sentence vector signal generating unit gained The sentence vector of the english note and example sentence similarity vectorial with the english note of the candidate Chinese meaning of a word and the sentence of example sentence;
Comprehensive similarity computing unit 602, the english note according to sentence vector similarity computing unit gained and example sentence Sentence vector similarity, calculate the comprehensive similarity of the meaning of a word to be mapped and the candidate Chinese meaning of a word.
English-Chinese meaning of a word mapping device based on term vector shown in Fig. 2~Fig. 6 can be integrated in various hardware devices. For example, it is possible to the English-Chinese meaning of a word mapping device based on term vector is integrated into: in the equipment such as pc, smart mobile phone, work station.
Can by using instruction or instruction set storage storing mode by embodiment of the present invention proposed word-based The English-Chinese meaning of a word mapping method of vector is stored on various storage mediums.These storage mediums include but is not limited to: CD, hard Disk, internal memory, u disk etc..
In sum, in embodiments of the present invention, extracted the synset of the meaning of a word to be mapped by English knowledge base, then Each synon candidate Chinese meaning of a word in synset is inquired about according to English-Chinese dictionary;The meaning of a word to be mapped is extracted by English knowledge base English note and example sentence, and inquire about english note and the example sentence of each candidate Chinese meaning of a word according to English-Chinese dictionary;Extensive Training term vector on English corpus, is then each english note and example sentence generates sentence vector respectively;Calculate word to be mapped The sentence vector of the english note of the justice and example sentence similarity vectorial with the english note of the candidate Chinese meaning of a word and the sentence of example sentence, Then calculate the comprehensive similarity of the meaning of a word to be mapped and the candidate Chinese meaning of a word;Select the maximum candidate Chinese meaning of a word of comprehensive similarity The target meaning of a word as the meaning of a word to be mapped.As can be seen here, it is achieved that English based on term vector after application embodiment of the present invention The Chinese meaning of a word maps.Embodiment of the present invention can carry out meaning of a word mapping using the term vector technology in deep learning, can be effective Consider the semantic relation between word in sentence;For the feature of english sentence, the present invention extracts notional word, can eliminate other in sentence The interference of function word;Propose sentence similarity computational methods, effectively consider the meaning of a word to be mapped and the annotation of the candidate Chinese meaning of a word With example sentence information.English-Chinese meaning of a word mapping method based on term vector proposed by the present invention and device, can be automatically performed knowledge base The meaning of a word mapping, there is higher accuracy.English-Chinese meaning of a word mapping method based on term vector proposed by the present invention and device, be A kind of full automatic meaning of a word mapping method, can avoid the loaded down with trivial details manual labor of traditional-handwork mapping method.The present invention The English-Chinese meaning of a word mapping method based on term vector proposing and device, have given full play to the advantage of deep learning, using term vector Technology generates sentence vector, being capable of the relatively accurately selection target meaning of a word, it is to avoid conventional machines translate the correct of mapping methods The relatively low problem of rate.
Embodiment in this specification is described by the way of going forward one by one, mutually the same similar partly mutually referring to. For device embodiment, because it is substantially similar to embodiment of the method, so describing fairly simple, correlation Place illustrates referring to the part of embodiment of the method.
Above the embodiment of the present invention is described in detail, specific embodiment used herein is carried out to the present invention Illustrate, the explanation of above example is only intended to help and understands methods and apparatus of the present invention;Simultaneously for this area one As technical staff, according to the thought of the present invention, all will change in specific embodiments and applications, therefore this explanation Book should not be construed as limitation of the present invention.

Claims (10)

1. a kind of English-Chinese meaning of a word mapping method based on term vector is it is characterised in that the method comprises the following steps:
Step one, by English knowledge base extract the meaning of a word to be mapped synset, then according to English-Chinese dictionary inquire about synset In each synon candidate Chinese meaning of a word;
Step 2, the english note by the English knowledge base extraction meaning of a word to be mapped and example sentence, and according to English-Chinese dictionary query steps The english note of each candidate Chinese meaning of a word of one gained and example sentence;
Step 3, on English corpus on a large scale, train term vector, then each english note for step 2 gained and example Sentence generates sentence vector respectively;
Sentence vector and the candidate Chinese meaning of a word of step 4, the english note of the meaning of a word to be mapped of calculation procedure three gained and example sentence English note and example sentence sentence vector similarity, then calculate the meaning of a word to be mapped similar to the synthesis of the candidate Chinese meaning of a word Degree;
The maximum candidate Chinese meaning of a word of step 5, selection comprehensive similarity is as the target meaning of a word of the meaning of a word to be mapped.
2. the English-Chinese meaning of a word mapping method based on term vector according to claim 1 is it is characterised in that described step one In, when extracting synset and the query candidate Chinese meaning of a word, particularly as follows:
Step 1-1) by English knowledge base, extract the synset of the meaning of a word to be mapped;
Step 1-2) according to English-Chinese dictionary, inquire about each synon candidate Chinese meaning of a word in synset.
3. the English-Chinese meaning of a word mapping method based on term vector according to claim 1 is it is characterised in that described step 2 In, when extracting english note and example sentence, particularly as follows:
Step 2-1) by English knowledge base, extract english note and the example sentence of the meaning of a word to be mapped;
Step 2-2) according to English-Chinese dictionary, query steps 1-2) english note of each candidate Chinese meaning of a word of gained and example sentence.
4. the English-Chinese meaning of a word mapping method based on term vector according to claim 1 is it is characterised in that described step 3 In, when training term vector and generating sentence vector, particularly as follows:
Step 3-1) train term vector on English corpus on a large scale;
Step 3-2) english note of step 2 gained and example sentence are carried out by lemmatization, extract the pretreatment such as notional word;
Step 3-3) according to step 3-1) term vector of gained, it is step 3-2) process the english note obtaining and example sentence is given birth to respectively Form a complete sentence subvector, particularly as follows:
English note or example sentence are denoted as s, a certain notional word in sentence is denoted as w, then the sentence vector of sentence sCan be by formula (1) obtain;
s → = σ k = 1 k = | s | w k → - - - ( 1 )
Wherein, | s | represents the quantity of the notional word that sentence s comprises,Represent notional word wkTerm vector.
5. the English-Chinese meaning of a word mapping method based on term vector according to claim 1 is it is characterised in that described step 4 In, when calculating acceptation similarity, particularly as follows:
Step 4-1) the sentence vector of the english note of the meaning of a word to be mapped of calculation procedure three gained and example sentence and candidate Chinese word The similarity of the sentence vector of the english note of justice and example sentence, particularly as follows:
English note or example sentence are denoted as s;
Any two sentence siAnd sjSentence vector similarity can be tried to achieve by formula (2);
s i m ( s i , s j ) = s i → · s j → | s i → | × | s j → | - - - ( 2 )
Wherein,WithRepresent sentence siAnd sjSentence vector,WithRepresent vectorWithMould;Formula (1) is substituted into Formula (2), can obtain formula (3);
s i m ( s i , s j ) = σ k = 1 k = | s i | w k → · σ k = 1 k = | s j | w k → | σ k = 1 k = | s i | w k → | × | σ k = 1 k = | s j | w k → | - - - ( 3 )
In order that similarity score is between 0 to 1, in order to be compared to it afterwards, by the sentence vector in formula (3) Using functionDo normalized, then formula (3) translates into formula (4);
s i m ( s i , s j ) = u ( σ k = 1 k = | s i | w k → ) · u ( σ k = 1 k = | s j | w k → ) | u ( σ k = 1 k = | s i | w k → ) | × | u ( σ k = 1 k = | s j | w k → ) | - - - ( 4 )
Wherein, functionNormalized, that is, refer to byIt is converted into unit vector;This process only changes vector magnitude and does not change Change direction, the cosine similarity not affecting vector calculates;
Step 4-2) by step 4-1) english note of gained and the sentence vector similarity of example sentence, calculate the meaning of a word to be mapped and time Choose the comprehensive similarity of cliction justice, particularly as follows:
The meaning of a word to be mapped in English knowledge base is denoted as bs, a certain candidate Chinese meaning of a word is denoted as ds, its comprehensive similarity can Calculated by formula (5);
s c o r e ( b s , d s ) = α s i m ( bs g l , ds g l ) + ( 1 - α ) σ bs e x &element; bs e x s ds e x &element; ds e x s s i m ( bs e x , ds e x ) - - - ( 5 )
Wherein, bsglEnglish note for bs, dsglEnglish note for ds, bsexsEnglish example sentence for bs, dsexsFor ds English example sentence, bsexFor bsexsIn an example sentence, dsexFor dsexsIn an example sentence, α and (1- α) represent respectively Annotation and the weight of example sentence, sim (bsgl,dsgl) and sim (bsex,dsex) calculated by formula (4).
6. a kind of English-Chinese meaning of a word mapping device based on term vector is it is characterised in that this device includes, candidate's meaning of a word cargo tracer Unit, annotation and example sentence extraction unit, sentence vector signal generating unit, acceptation similarity computing unit, target meaning of a word select unit, its In:
Candidate's meaning of a word query unit, for extracting the synset of the meaning of a word to be mapped in English knowledge base, then according to English-Chinese Each synon candidate Chinese meaning of a word in dictionary enquiry synset;
Annotation and example sentence extraction unit, for extracting english note and the example sentence of the meaning of a word to be mapped in English knowledge base, and according to The english note of each candidate Chinese meaning of a word of English-Chinese dictionary query candidate meaning of a word query unit gained and example sentence;
Sentence vector signal generating unit, for training term vector on English corpus on a large scale, is then that annotation extracts with example sentence Each english note of unit gained and example sentence generate sentence vector respectively;
Acceptation similarity computing unit, for calculating the english note of the meaning of a word to be mapped and the example of sentence vector signal generating unit gained The sentence vector of sentence and the similarity of the english note of the candidate Chinese meaning of a word and the sentence vector of example sentence, then calculate word to be mapped Justice and the comprehensive similarity of the candidate Chinese meaning of a word;
Target meaning of a word select unit, for selecting the maximum candidate Chinese meaning of a word of comprehensive similarity as the target of the meaning of a word to be mapped The meaning of a word.
7. the English-Chinese meaning of a word mapping device based on term vector according to claim 6 is it is characterised in that described candidate's meaning of a word Query unit also includes:
Synset extraction unit, for extracting the synset of the meaning of a word to be mapped;
Candidate Chinese meaning of a word query unit, for inquiring about each synon candidate Chinese meaning of a word in synset.
8. the English-Chinese meaning of a word mapping device based on term vector according to claim 6 is it is characterised in that described annotation and example Sentence extraction unit also includes:
Word sense information extraction unit to be mapped, for extracting english note and the example sentence of the meaning of a word to be mapped;
Candidate's meaning of a word information extraction unit, for extracting each candidate Chinese meaning of a word of candidate Chinese meaning of a word query unit gained English note and example sentence.
9. the English-Chinese meaning of a word mapping device based on term vector according to claim 6 is it is characterised in that described sentence is vectorial Signal generating unit also includes:
Term vector training unit, for training term vector on English corpus on a large scale;
Word sense information pretreatment unit, for carrying out morphology also to annotation and the english note of example sentence extraction unit gained and example sentence The pretreatment such as former, extraction notional word;
Sentence vector signal generating unit, for obtaining for word sense information pretreatment unit according to term vector training unit gained term vector English note and example sentence generate respectively sentence vector.
10. the English-Chinese meaning of a word mapping device based on term vector according to claim 6 is it is characterised in that described meaning of a word phase Also include like degree computing unit:
Sentence vector similarity computing unit, for calculating the english note of the meaning of a word to be mapped of sentence vector signal generating unit gained Similarity with the sentence vector of example sentence and the english note of the candidate Chinese meaning of a word and the sentence vector of example sentence;
Comprehensive similarity computing unit, the english note according to sentence vector similarity computing unit gained and the sentence of example sentence to Amount similarity, calculates the comprehensive similarity of the meaning of a word to be mapped and the candidate Chinese meaning of a word.
CN201610765658.0A 2016-08-30 2016-08-30 A kind of English-Chinese meaning of a word mapping method and device based on term vector Active CN106339371B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201610765658.0A CN106339371B (en) 2016-08-30 2016-08-30 A kind of English-Chinese meaning of a word mapping method and device based on term vector

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201610765658.0A CN106339371B (en) 2016-08-30 2016-08-30 A kind of English-Chinese meaning of a word mapping method and device based on term vector

Publications (2)

Publication Number Publication Date
CN106339371A true CN106339371A (en) 2017-01-18
CN106339371B CN106339371B (en) 2019-04-30

Family

ID=57823357

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201610765658.0A Active CN106339371B (en) 2016-08-30 2016-08-30 A kind of English-Chinese meaning of a word mapping method and device based on term vector

Country Status (1)

Country Link
CN (1) CN106339371B (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108231062A (en) * 2018-01-12 2018-06-29 科大讯飞股份有限公司 A kind of voice translation method and device
CN109117471A (en) * 2017-06-23 2019-01-01 中国移动通信有限公司研究院 A kind of calculation method and terminal of the word degree of correlation
CN109902673A (en) * 2019-01-28 2019-06-18 北京明略软件系统有限公司 Table Header information identification and method for sorting, system, terminal and storage medium in table
WO2019205564A1 (en) * 2018-04-24 2019-10-31 中译语通科技股份有限公司 Machine translation system based on capsule neural network and information data processing terminal
CN111124141A (en) * 2018-10-12 2020-05-08 北京搜狗科技发展有限公司 Neural network model training method and device for determining candidate items

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6138087A (en) * 1994-09-30 2000-10-24 Budzinski; Robert L. Memory system for storing and retrieving experience and knowledge with natural language utilizing state representation data, word sense numbers, function codes and/or directed graphs
CN101847141A (en) * 2010-06-03 2010-09-29 复旦大学 Method for measuring semantic similarity of Chinese words
CN103218444A (en) * 2013-04-22 2013-07-24 中央民族大学 Method of Tibetan language webpage text classification based on semanteme
CN103853710A (en) * 2013-11-21 2014-06-11 北京理工大学 Coordinated training-based dual-language named entity identification method
CN103942192A (en) * 2013-11-21 2014-07-23 北京理工大学 Bilingual largest noun group separating-fusing translation method
CN104424279A (en) * 2013-08-30 2015-03-18 腾讯科技(深圳)有限公司 Text relevance calculating method and device

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6138087A (en) * 1994-09-30 2000-10-24 Budzinski; Robert L. Memory system for storing and retrieving experience and knowledge with natural language utilizing state representation data, word sense numbers, function codes and/or directed graphs
CN101847141A (en) * 2010-06-03 2010-09-29 复旦大学 Method for measuring semantic similarity of Chinese words
CN103218444A (en) * 2013-04-22 2013-07-24 中央民族大学 Method of Tibetan language webpage text classification based on semanteme
CN104424279A (en) * 2013-08-30 2015-03-18 腾讯科技(深圳)有限公司 Text relevance calculating method and device
CN103853710A (en) * 2013-11-21 2014-06-11 北京理工大学 Coordinated training-based dual-language named entity identification method
CN103942192A (en) * 2013-11-21 2014-07-23 北京理工大学 Bilingual largest noun group separating-fusing translation method

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
ERIC H. HUANG ET AL: "Improving Word Representations via Global Context and MultipleWord Prototypes", 《PROCEEDINGS OF THE 50TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS》 *
XINXIONG CHEN ET AL: "A Unified Model forWord Sense Representation and Disambiguation", 《PROCEEDINGS OF THE 2014 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING》 *
鹿文鹏 等: "基于依存适配度的知识自动获取词义消歧方法", 《软件学报》 *
鹿文鹏 等: "基于领域知识的图模型词义消歧方法", 《自动化学报》 *

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109117471A (en) * 2017-06-23 2019-01-01 中国移动通信有限公司研究院 A kind of calculation method and terminal of the word degree of correlation
CN109117471B (en) * 2017-06-23 2021-08-10 中国移动通信有限公司研究院 Word relevancy calculation method and terminal
CN108231062A (en) * 2018-01-12 2018-06-29 科大讯飞股份有限公司 A kind of voice translation method and device
WO2019205564A1 (en) * 2018-04-24 2019-10-31 中译语通科技股份有限公司 Machine translation system based on capsule neural network and information data processing terminal
CN111124141A (en) * 2018-10-12 2020-05-08 北京搜狗科技发展有限公司 Neural network model training method and device for determining candidate items
CN109902673A (en) * 2019-01-28 2019-06-18 北京明略软件系统有限公司 Table Header information identification and method for sorting, system, terminal and storage medium in table

Also Published As

Publication number Publication date
CN106339371B (en) 2019-04-30

Similar Documents

Publication Publication Date Title
CN106339371A (en) English and Chinese word meaning mapping method and device based on word vectors
CN104899304B (en) Name entity recognition method and device
CN107526799A (en) A kind of knowledge mapping construction method based on deep learning
CN102662931B (en) Semantic role labeling method based on synergetic neural network
CN102693279B (en) Method, device and system for fast calculating comment similarity
CN102737042B (en) Method and device for establishing question generation model, and question generation method and device
CN103678272B (en) The disposal route of unregistered word in the interdependent treebank of Chinese
CN108959258A (en) It is a kind of that entity link method is integrated based on the specific area for indicating to learn
CN105446958A (en) Word aligning method and device
CN101290616A (en) Statistical machine translation method and system
CN105677857B (en) method and device for accurately matching keywords with marketing landing pages
CN106227714A (en) A kind of method and apparatus obtaining the key word generating poem based on artificial intelligence
CN111160041B (en) Semantic understanding method and device, electronic equipment and storage medium
CN105975475A (en) Chinese phrase string-based fine-grained thematic information extraction method
CN107491444A (en) Parallelization word alignment method based on bilingual word embedded technology
He et al. Image captioning with text-based visual attention
CN104731775A (en) Method and device for converting spoken languages to written languages
CN108491459B (en) Optimization method for software code abstract automatic generation model
Sánchez-Martínez et al. Inferring shallow-transfer machine translation rules from small parallel corpora
CN108664464A (en) A kind of the determination method and determining device of semantic relevancy
CN103810993A (en) Text phonetic notation method and device
CN112380834B (en) Method and system for detecting plagiarism of Tibetan paper
Mohnot et al. Hybrid approach for Part of Speech Tagger for Hindi language
Shi et al. The model of grey periodic incidence and their rehabilitation
CN103593334A (en) Method and system for judging emotional degree of text

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
TR01 Transfer of patent right

Effective date of registration: 20200228

Address after: 250001 room 1002, block C, Yinhe building, No. 2008, Xinluo street, high tech Zone, Jinan City, Shandong Province

Patentee after: Shandong jingweishengrui Data Technology Co.,Ltd.

Address before: 250353 Qilu Industrial University, 3501 University Road, Science Park, Xincheng University, Ji'nan, Shandong

Patentee before: Qilu University of Technology

TR01 Transfer of patent right
PE01 Entry into force of the registration of the contract for pledge of patent right

Denomination of invention: A method and device for mapping English and Chinese word meaning based on word vector

Effective date of registration: 20210803

Granted publication date: 20190430

Pledgee: Jinan Rural Commercial Bank Co.,Ltd. Runfeng sub branch

Pledgor: Shandong jingweishengrui Data Technology Co.,Ltd.

Registration number: Y2021980007214

PE01 Entry into force of the registration of the contract for pledge of patent right
PC01 Cancellation of the registration of the contract for pledge of patent right

Date of cancellation: 20220721

Granted publication date: 20190430

Pledgee: Jinan Rural Commercial Bank Co.,Ltd. Runfeng sub branch

Pledgor: Shandong jingweishengrui Data Technology Co.,Ltd.

Registration number: Y2021980007214

PC01 Cancellation of the registration of the contract for pledge of patent right
PE01 Entry into force of the registration of the contract for pledge of patent right

Denomination of invention: A method and device for English-Chinese word meaning mapping based on word vector

Effective date of registration: 20220729

Granted publication date: 20190430

Pledgee: Jinan Rural Commercial Bank Co.,Ltd. Runfeng sub branch

Pledgor: Shandong jingweishengrui Data Technology Co.,Ltd.

Registration number: Y2022980011557

PE01 Entry into force of the registration of the contract for pledge of patent right
PC01 Cancellation of the registration of the contract for pledge of patent right

Date of cancellation: 20230724

Granted publication date: 20190430

Pledgee: Jinan Rural Commercial Bank Co.,Ltd. Runfeng sub branch

Pledgor: Shandong jingweishengrui Data Technology Co.,Ltd.

Registration number: Y2022980011557

PC01 Cancellation of the registration of the contract for pledge of patent right