CN109145289A - Based on the old-Chinese bilingual sentence similarity calculating method for improving relation vector model - Google Patents

Based on the old-Chinese bilingual sentence similarity calculating method for improving relation vector model Download PDF

Info

Publication number
CN109145289A
CN109145289A CN201810808788.7A CN201810808788A CN109145289A CN 109145289 A CN109145289 A CN 109145289A CN 201810808788 A CN201810808788 A CN 201810808788A CN 109145289 A CN109145289 A CN 109145289A
Authority
CN
China
Prior art keywords
sentence
keyword
chinese
vector
laotian
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201810808788.7A
Other languages
Chinese (zh)
Inventor
周兰江
李思卓
周枫
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Kunming University of Science and Technology
Original Assignee
Kunming University of Science and Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Kunming University of Science and Technology filed Critical Kunming University of Science and Technology
Priority to CN201810808788.7A priority Critical patent/CN109145289A/en
Publication of CN109145289A publication Critical patent/CN109145289A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/10Text processing
    • G06F40/194Calculation of difference between files
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/205Parsing
    • G06F40/211Syntactic parsing, e.g. based on context-free grammar [CFG] or unification grammars
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/237Lexical tools
    • G06F40/247Thesauruses; Synonyms
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/253Grammatical analysis; Style critique
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The present invention relates to a kind of combination similarity and scheme matched old-Chinese bilingual sentence alignment schemes, belongs to natural language processing and machine learning techniques field.Old-Chinese bilingual dictionary that the present invention is first depending on building calculates the similarity value of Laotian and Chinese sentence, then bilingual sentence length information is fully considered, calculate Laotian and Chinese sentence length ratio value, comprehensive two values calculate Laotian and Chinese sentence similarity value, so that old-Chinese bilingual sentence similarity calculation reliability with higher, the higher Laotian of similarity and Chinese sentence can be thus aligned in alignment procedure, simplify the process of sentence alignment.The present invention can effectively excavate parallel sentence pairs from bilingualism corpora, and old-calculating of Chinese bilingual sentence similarity and the best match algorithm of bigraph (bipartite graph) sufficiently combine, and can effectively improve the accuracy rate of sentence alignment, therefore the present invention has certain research significance.

Description

Based on the old-Chinese bilingual sentence similarity calculating method for improving relation vector model
Technical field
The present invention relates to a kind of based on the old-Chinese bilingual sentence similarity calculating method for improving relation vector model, belongs to Natural language processing and machine learning techniques field.
Background technique
Sentence similarity calculating is research topic important in natural language processing field, is widely used.In question and answer In system, needs to ask a question to user using similarity based method and the problems in system knowledge base is compared, find problem Best match and return to optimum answer.In the generating process of automatic abstract, need to use the method for sentence similarity to arrange Except the sentence of similar import, the redundancy of digest is avoided.In terms of across language, the old bilingual sentence similarity calculation of the Chinese can be applied The search of the old hot news of the Chinese, the old shared education resources of the Chinese, and promote the development of the Chinese old cultural exchanges and both sides in all respects.
Summary of the invention
The technical problem to be solved in the present invention is to provide a kind of based on the old-Chinese bilingual sentence phase for improving relation vector model Like degree calculation method, the accuracy rate of old-Chinese bilingual sentence similarity calculation can be effectively improved, it on the other hand also can be to Laos Language corpus is expanded, therefore the present invention has certain research significance.
The technical solution adopted by the present invention is that: it is a kind of based on the old-Chinese bilingual sentence similarity for improving relation vector model Calculation method, characterized by the following steps:
Step1, first to Chinese sentence T in corpusiWith Laotian sentence TjParticiple and part-of-speech tagging are carried out, is therefrom screened The keyword of Chinese sentence and Laotian sentence out;
Step1.1, first with Words partition system T to Chinese sentence respectivelyiWith Laotian sentence TjIt is segmented, is obtained Chinese and Laotian sentence after participle;
Step1.2, by participle after, carry out part-of-speech tagging, therefrom filter out the main component of a sentence, they include Noun, pronoun, verb, adjective and adverbial word these types part of speech, it is crucial accordingly as Chinese sentence and Laotian sentence Word does so the semantic integrity that can guarantee sentence to the utmost;
Step2, the Chinese sentence T for obtaining Step1iWith Laotian sentence TjKeyword be converted to third party's language English Language constitutes TiAnd TjKeyword vector indicate;
Step2.1, it defines 1: the definition that keyword vector indicates: such as giving a Chinese sentence Ti, by Words partition system After participle, obtained keyword miThe vector of composition is known as Chinese sentence TiKeyword vector indicate, be Tiv={ m1, m2,…,mn};
Step3, Chinese sentence T is constitutediWith Laotian sentence TjKeyword vector expression after, then consider vector length Shorter crucial term vector, it is assumed here that Len (Ti)≤Len(Tj), i.e. hypothesis Chinese sentence vector length is shorter than Laotian sentence Vector length calculates Chinese sentence T at this timeiInitial weight value vector T Bi={ b1,b2,…,bn, for Chinese sentence TiIn Each keyword mi, calculate old-Chinese bilingual sentence similarity value;
Step3.1, due to having here related to Chinese sentence TiWith Laotian sentence TjKeyword indicate and weighted value to Amount, so being illustrated here using definition 2, definition 3, definition 4: defining 2: giving a Chinese sentence TiKeyword to Amount indicates Tiv={ m1,m2,…,mn, the keyword m in vectoriPrevious keyword mi-1Referred to as miPreceding keyword, mi's The latter keyword mi+1Referred to as miRear keyword;It defines 3: giving a Chinese sentence TiKeyword vector indicate Tiv={ m1,m2,…,mn, TiVector length Len (Ti)=n gives each keyword miAssign an initial weight valueThe weighted value of all keywords constitutes a vector and is known as TiInitial weight value vector, be expressed as TBi ={ b1,b2,…,bn};It defines 4: giving two Chinese sentence TiWith Laotian sentence TjKeyword vector indicate, for Tiv In any keyword miIf miAlso in TjMiddle appearance, then claim miIn TjMiddle presence, TiIn it is all in TjPresent in keyword The vector of composition is known as TiBased on TjThere are vectors, be expressed as Ei,j={ e1,e2,…,ep, there are keywords corresponding in vector Weighted value constitute vector be known as TiBased on TjExistence value vector, be expressed as TEi,j={ v1,v2,…,vp, then respectively into Row Step3.2 and Step3.3;
Step3.2, third party's language precision is improved by increaseing accordingly the weight that keyword is near synonym, then carried out Step3.4;
Step3.3, the precision of keyword position is improved by increasing the judgement number of preceding keyword and rear keyword, so After carry out Step3.4;
Step3.4, basis obtain Chinese sentence TiInitial weight value vector T Bi={ b1,b2,…,bn, Chinese sentence Ti Based on Laotian sentence TjExistence value vector T Ei,j={ v1,v2,…,vp, therefore, old-Chinese bilingual sentence similarity value calculates Shown in formula such as formula (1):
Specifically, specific step is as follows by the Step3.2;
Step3.2.1, assume Len (Ti)≤Len(Tj), calculate TiInitial weight value vector T Bi={ b1,b2,…, bn};
Step3.2.2, for Chinese sentence TiEach of keyword miIf miIn Laotian sentence TjMiddle presence Or with the presence of synonym, consider miIn TiAnd TjIn preceding keyword, if the two preceding keywords are identical word or same Adopted word, then by TBiMiddle miCorresponding weight increases α times, if the two preceding keywords are near synonym, by TBiMiddle miAccordingly Weight increases β (1 < β < α) times, for miRear keyword do identical processing, E may finally be obtainedi,j={ e1, e2,...,epAnd TEi,j={ v1,v2,...,vp}。
Specifically, specific step is as follows by the Step3.3;
Step3.3.1, assume Len (Ti)≤Len(Tj), calculate Chinese sentence TiInitial weight value vector T Bi={ b1, b2,…,bn};
Step3.3.2, for TiEach of keyword miIf: miIn Laotian sentence TjMiddle presence has same Adopted word exists, and considers miIn TiAnd TjIn beforeA keyword, wherein γ is rounded downwards, and γ is TjThe number of keyword, such as Before fruitA keyword is identical word or synonym, then by TBiMiddle miCorresponding weight increases α times, if precedingA key Word is near synonym, then by TBiMiddle miCorresponding weight increases β (1 < β < α) times, for miAfterA keyword does identical place Reason, finally obtains Ei,j={ e1,e2,...,epAnd TEi,j={ v1,v2,...,vp}。
The beneficial effects of the present invention are:
1. old-Chinese bilingual sentence similarity calculating method of the invention based on improved relation vector model, proposes A kind of relationship considering bilingual sentence structurally and semantically information simultaneously on the basis of vector space model using third party's language Vector model, improves efficiently traditional vector space model, improves old-Chinese bilingual sentence similarity to a certain extent The accuracy rate of calculating.
2. old-Chinese bilingual sentence similarity calculating method of the invention based on improved relation vector model, this mould Type considers the synonymous information of Matching Relation and keyword between the keyword of composition sentence, in third party's language and keyword It all increases in the precision of position, the structurally and semantically information of sentence can be embodied well, improve old-Chinese bilingual sentence phase The accuracy calculated like degree.
3. old-Chinese bilingual sentence similarity calculating method of the invention based on improved relation vector model, realization Calculation method across language sentence similarity can apply the search in the old hot news of the Chinese, search two marks of similar import Topic generates the sentence for excluding similar import when the autoabstract of the old network hotspot news of the Chinese, avoids the redundancy and rush of digest sentence Into the development of the various Chinese old cultural exchanges and both sides.
Detailed description of the invention
Fig. 1 is the overview flow chart in the present invention.
Fig. 2 is that third party's language precision improves in the present invention.
Fig. 3 is that keyword position precision improves in the present invention.
Specific embodiment
In order to describe in more detail the present invention and convenient for the understanding of those skilled in the art, with reference to the accompanying drawing and embodiment pair The present invention is further described, and the embodiment of this part for illustrating the present invention, do not come with this by the purpose being easy to understand The limitation present invention.
Embodiment 1: as shown in Figure 1-3, based on a kind of old-Chinese bilingual sentence similarity by improved relation vector model Calculation method, includes the following steps:
Step1, participle and part-of-speech tagging are carried out to Chinese sentence in corpus and Laotian sentence first, is screened out from it the Chinese The keyword of sentence and Laotian sentence;
Step1.1, first with Words partition system respectively to Chinese sentence TiWith Laotian sentence TjIt is segmented, is divided Chinese and Laotian sentence after word.
Step1.2, by participle after, carry out part-of-speech tagging, therefrom filter out the main component of a sentence, they include Noun, pronoun, verb, adjective and adverbial word these types part of speech, it is crucial accordingly as Chinese sentence and Laotian sentence Word does so the semantic integrity that can guarantee sentence to the utmost.
Step2, from the word segmentation result of Step1, extract Chinese sentence TiWith Laotian sentence TjCorresponding keyword simultaneously will These keywords are converted to third party's language English, constitute TiAnd TjKeyword vector indicate.
Step2.1, due to having here related to Chinese sentence TiWith Laotian sentence TjCrucial term vector, so herein Place is illustrated using definition 1: being defined 1: being given a Chinese sentence Ti, after Words partition system segments, obtained key Word miThe vector of composition is known as Chinese sentence TiKeyword vector indicate, be Tiv={ m1,m2,…,mn}。
Step3, Chinese sentence T is constitutediWith Laotian sentence TjCrucial term vector after, then consider vector length it is shorter Crucial term vector, it is assumed here that Len (Ti)≤Len(Tj) (i.e. hypothesis Chinese sentence vector length is shorter than Laotian sentence vector Length), Chinese sentence T is calculated at this timeiInitial weight value vector T Bi={ b1,b2,…,bn}.For Chinese sentence TiIn it is every One keyword mi, some processing are taken turns doing to calculate old-Chinese bilingual sentence similarity value, can pass through Figure of description herein 2 and attached drawing 3 help to understand old-Chinese bilingual sentence similarity calculation proposed by the present invention based on improved relation vector model The improvement of method.Relation vector model not only considers whether the keyword in a sentence occurs in another sentence, it is also contemplated that With the influence of most close two words (preceding keyword and rear keyword) of this keyword, in this way, in sentence between all keywords Structural relation embodied, thus increase the comprehensive and accuracy of analysis.The present invention is exactly to carry out to this model Some improvement, to improve the accuracy rate of old-Chinese bilingual sentence similarity calculation.
Step3.1, due to having here related to Chinese sentence TiWith Laotian sentence TjKeyword indicate and weighted value to Amount, so being illustrated here using definition 2, definition 3, definition 4: defining 2: giving a Chinese sentence TiKeyword to Amount indicates Tiv={ m1,m2,…,mn, the keyword m in vectoriPrevious keyword mi-1Referred to as miPreceding keyword, mi's The latter keyword mi+1Referred to as miRear keyword.It defines 3: giving a Chinese sentence TiKeyword vector indicate Tiv= {m1,m2,…,mn, TiVector length Len (Ti)=n gives each keyword miAssign an initial weight valueThe weighted value of all keywords constitutes a vector and is known as TiInitial weight value vector, be expressed as TBi ={ b1,b2,…,bn}.It defines 4: giving two Chinese sentence TiWith Laotian sentence TjKeyword vector indicate, for Tiv In any keyword miIf miAlso in TjMiddle appearance, then claim miIn TjMiddle presence, TiIn it is all in TjPresent in keyword The vector of composition is known as TiBased on TjThere are vectors, be expressed as Ei,j={ e1,e2,…,ep}.There are keywords corresponding in vector Weighted value constitute vector be known as TiBased on TjExistence value vector, be expressed as TEi,j={ v1,v2,…,vp}。
Step3.2, sentence similarity is calculated using keyword to be converted to the method for third party's language due to the invention, It is wherein just inevitably influenced by third party's language, the influence of near synonym is encountered during especially converting.Therefore, Need to improve the precision of third party's language, the present invention is realized by increaseing accordingly the weight that keyword is near synonym.Herein may be used To help the raising understood Ben Faming in third party's language precision by Figure of description 2.
Step3.2.1, assume Len (Ti)≤Len(Tj), calculate TiInitial weight value vector T Bi={ b1,b2,…, bn}。
Step3.2.2, for Chinese sentence TiEach of keyword miIf miIn Laotian sentence TjMiddle presence Or with the presence of synonym, consider miIn TiAnd TjIn preceding keyword, if the two preceding keywords are identical word or same Adopted word, then by TBiMiddle miCorresponding weight increases α times, if the two preceding keywords are near synonym, by TBiMiddle miAccordingly Weight increases β (1 < β < α) times, for miRear keyword do identical processing, finally obtain Ei,j={ e1,e2,..., epAnd TEi,j={ v1,v2,...,vp}
Step3.3, similar, main subject+predicate+object structure is constituted due to Chinese and the sentence of Laotian It is similar, but have some subtle differences, these differences result in keyword position and can deviate, that is, previous pass Keyword and the latter keyword cannot determine that can a keyword increase weight completely, therefore will cause the position due to keyword Precision caused by setting is lost.Therefore, the present invention improves keyword by increasing the judgement number of preceding keyword and rear keyword The precision of position.The raising understood Ben Faming in keyword position precision can be helped by Figure of description 3 herein.
Step3.3.1, assume Len (Ti)≤Len(Tj), calculate Chinese sentence TiInitial weight value vector T Bi={ b1, b2,…,bn}。
Step3.3.2, for TiEach of keyword miIf: miIn Laotian sentence TjMiddle presence has same Adopted word exists, and considers miIn TiAnd TjIn beforeA keyword, wherein γ is rounded downwards, and γ is TjThe number of keyword.Such as Before fruitA keyword is identical word or synonym, then by TBiMiddle miCorresponding weight increases α times, if precedingA key Word is near synonym, then by TBiMiddle miCorresponding weight increases β (1 < β < α) times, for miAfterA keyword does identical place Reason, finally obtains Ei,j={ e1,e2,...,epAnd TEi,j={ v1,v2,...,vp}。
γ=Len (T can be found during specific experimenti) influence whether last similarity accuracy rate, that is, Consider front and backError can be generated when (being rounded downwards) a keyword.There are two types of situations to occur: the first situation: working as key When word number is less, the latter keyword does not have much affect to accuracy rate before only considering, moreover it is possible to the accuracy calculated is kept, but It is after keyword number increase, error caused by the grammatical differences between Chinese and Laotian also just increases, and preceding the latter is closed Keyword cannot be guaranteed the accuracy rate calculated, therefore accuracy rate declines;Second situation: right when keyword number is less Accuracy rate does not have much affect, but when keyword number increases, considers front and backThe keyword that will lead to is repeated meter It calculates, therefore it is higher to will lead to accuracy rate.Therefore, it is found after comprehensive analysis, when keyword number is between 5 to 7, old-Chinese Bilingual sentence similarity calculation is more accurate.
The present invention can be successfully solved in the case where Laotian corpus is less, and Chinese and Laotian is effectively performed Bilingual sentence similarity calculation, on the other hand Laotian corpus can also be expanded, therefore the present invention has certain grind Study carefully meaning.
In conjunction with attached drawing, the embodiment of the present invention is explained in detail above, but the present invention is not limited to above-mentioned Embodiment within the knowledge of a person skilled in the art can also be before not departing from present inventive concept It puts and makes a variety of changes.

Claims (3)

1. a kind of based on the old-Chinese bilingual sentence similarity calculating method for improving relation vector model, it is characterised in that: including such as Lower step:
Step1, first to Chinese sentence T in corpusiWith Laotian sentence TjParticiple and part-of-speech tagging are carried out, the Chinese is screened out from it The keyword of sentence and Laotian sentence;
Step1.1, first with Words partition system T to Chinese sentence respectivelyiWith Laotian sentence TjIt is segmented, is segmented Chinese and Laotian sentence afterwards;
Step1.2, by participle after, carry out part-of-speech tagging, therefrom filter out the main component of a sentence, they include name Word, pronoun, verb, adjective and adverbial word these types part of speech, using it as Chinese sentence and the corresponding keyword of Laotian sentence, Do so the semantic integrity that can guarantee sentence to the utmost;
Step2, the Chinese sentence T for obtaining Step1iWith Laotian sentence TjKeyword be converted to third party's language English, structure At TiAnd TjKeyword vector indicate;
Step2.1, it defines 1: the definition that keyword vector indicates: such as giving a Chinese sentence Ti, segmented by Words partition system Afterwards, obtained keyword miThe vector of composition is known as Chinese sentence TiKeyword vector indicate, be Tiv={ m1,m2,…, mn}。
Step3, Chinese sentence T is constitutediWith Laotian sentence TjKeyword vector expression after, then consider vector length it is shorter Crucial term vector, it is assumed here that Len (Ti)≤Len(Tj), i.e. hypothesis Chinese sentence vector length is shorter than Laotian sentence vector Length calculates Chinese sentence T at this timeiInitial weight value vector T Bi={ b1,b2,…,bn, for Chinese sentence TiIn it is every One keyword mi, calculate old-Chinese bilingual sentence similarity value;
Step3.1, due to having here related to Chinese sentence TiWith Laotian sentence TjKeyword indicate and weighted value vector, institute To be illustrated here using definition 2, definition 3, definition 4: defining 2: giving a Chinese sentence TiKeyword vector indicate Tiv={ m1,m2,…,mn, the keyword m in vectoriPrevious keyword mi-1Referred to as miPreceding keyword, miThe latter close Keyword mi+1Referred to as miRear keyword;It defines 3: giving a Chinese sentence TiKeyword vector indicate Tiv={ m1,m2,…, mn, TiVector length Len (Ti)=n gives each keyword miAssign an initial weight valueIt is all The weighted value of keyword constitutes a vector and is known as TiInitial weight value vector, be expressed as TBi={ b1,b2,…,bn};Definition 4: giving two Chinese sentence TiWith Laotian sentence TjKeyword vector indicate, for TivIn any keyword miIf miAlso in TjMiddle appearance, then claim miIn TjMiddle presence, TiIn it is all in TjPresent in keyword constitute vector be known as TiBased on Tj There are vectors, be expressed as Ei,j={ e1,e2,…,ep, the vector constituted there are the weighted value of keyword corresponding in vector is known as TiBased on TjExistence value vector, be expressed as TEi,j={ v1,v2,…,vp, Step3.2 and Step3.3 is then carried out respectively;
Step3.2, third party's language precision is improved by increaseing accordingly the weight that keyword is near synonym, then carried out Step3.4;
Step3.3, improve the precision of keyword position by increasing the judgement number of preceding keyword and rear keyword, then into Row Step3.4;
Step3.4, basis obtain Chinese sentence TiInitial weight value vector T Bi={ b1,b2,…,bn, Chinese sentence TiIt is based on Laotian sentence TjExistence value vector T Ei,j={ v1,v2,…,vp, therefore, old-Chinese bilingual sentence similarity value calculation formula As shown in formula (1):
2. according to claim 1 a kind of based on the old-Chinese bilingual sentence similarity calculation side for improving relation vector model Method, it is characterised in that: specific step is as follows by the Step3.2;
Step3.2.1, assume Len (Ti)≤Len(Tj), calculate TiInitial weight value vector T Bi={ b1,b2,…,bn};
Step3.2.2, for Chinese sentence TiEach of keyword miIf miIn Laotian sentence TjMiddle presence has Synonym exists, and considers miIn TiAnd TjIn preceding keyword, if the two preceding keywords be identical word or synonym, Then by TBiMiddle miCorresponding weight increases α times, if the two preceding keywords are near synonym, by TBiMiddle miCorresponding weight Increase β (1 < β < α) times, for miRear keyword do identical processing, E may finally be obtainedi,j={ e1,e2,..., epAnd TEi,j={ v1,v2,...,vp}。
3. according to claim 1 a kind of based on the old-Chinese bilingual sentence similarity calculation side for improving relation vector model Method, it is characterised in that: specific step is as follows by the Step3.3;
Step3.3.1, assume Len (Ti)≤Len(Tj), calculate Chinese sentence TiInitial weight value vector T Bi={ b1, b2,…,bn};
Step3.3.2, for TiEach of keyword miIf: miIn Laotian sentence TjMiddle presence has synonym to deposit Considering miIn TiAnd TjIn beforeA keyword, wherein γ is rounded downwards, and γ is TjThe number of keyword, if preceding A keyword is identical word or synonym, then by TBiMiddle miCorresponding weight increases α times, if precedingA keyword is close Adopted word, then by TBiMiddle miCorresponding weight increases β (1 < β < α) times, for miAfterA keyword does identical processing, most E is obtained eventuallyi,j={ e1,e2,...,epAnd TEi,j={ v1,v2,...,vp}。
CN201810808788.7A 2018-07-19 2018-07-19 Based on the old-Chinese bilingual sentence similarity calculating method for improving relation vector model Pending CN109145289A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810808788.7A CN109145289A (en) 2018-07-19 2018-07-19 Based on the old-Chinese bilingual sentence similarity calculating method for improving relation vector model

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810808788.7A CN109145289A (en) 2018-07-19 2018-07-19 Based on the old-Chinese bilingual sentence similarity calculating method for improving relation vector model

Publications (1)

Publication Number Publication Date
CN109145289A true CN109145289A (en) 2019-01-04

Family

ID=64801258

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810808788.7A Pending CN109145289A (en) 2018-07-19 2018-07-19 Based on the old-Chinese bilingual sentence similarity calculating method for improving relation vector model

Country Status (1)

Country Link
CN (1) CN109145289A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112257453A (en) * 2020-09-23 2021-01-22 昆明理工大学 Chinese-Yue text similarity calculation method fusing keywords and semantic features

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20110106805A1 (en) * 2009-10-30 2011-05-05 International Business Machines Corporation Method and system for searching multilingual documents
CN102360372A (en) * 2011-10-09 2012-02-22 北京航空航天大学 Cross-language document similarity detection method
CN103034627A (en) * 2011-10-09 2013-04-10 北京百度网讯科技有限公司 Method and device for calculating sentence similarity and method and device for machine translation
CN105824797A (en) * 2015-01-04 2016-08-03 华为技术有限公司 Method, device and system evaluating semantic similarity

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20110106805A1 (en) * 2009-10-30 2011-05-05 International Business Machines Corporation Method and system for searching multilingual documents
CN102360372A (en) * 2011-10-09 2012-02-22 北京航空航天大学 Cross-language document similarity detection method
CN103034627A (en) * 2011-10-09 2013-04-10 北京百度网讯科技有限公司 Method and device for calculating sentence similarity and method and device for machine translation
CN105824797A (en) * 2015-01-04 2016-08-03 华为技术有限公司 Method, device and system evaluating semantic similarity

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
殷耀明 等: "基于关系向量模型的句子相似度计算", 《计算机工程与应用》 *

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112257453A (en) * 2020-09-23 2021-01-22 昆明理工大学 Chinese-Yue text similarity calculation method fusing keywords and semantic features

Similar Documents

Publication Publication Date Title
CN106202042B (en) A kind of keyword abstraction method based on figure
Bod An all-subtrees approach to unsupervised parsing
CN109325229B (en) Method for calculating text similarity by utilizing semantic information
Banerjee et al. Meaningless yet meaningful: Morphology grounded subword-level NMT
CN105068997B (en) The construction method and device of parallel corpora
CN110378409A (en) It is a kind of based on element association attention mechanism the Chinese get over news documents abstraction generating method
Costa-Jussá et al. Statistical machine translation enhancements through linguistic levels: A survey
Hasan et al. Neural clinical paraphrase generation with attention
CN107092605A (en) A kind of entity link method and device
CN113901208A (en) Method for analyzing emotion tendentiousness of intermediate-crossing language comments blended with theme characteristics
Das et al. A survey of the model transfer approaches to cross-lingual dependency parsing
Liu et al. Language model augmented relevance score
Wang et al. Hierarchical phrase-based sequence-to-sequence learning
Zhu et al. Concept transfer learning for adaptive language understanding
Casacuberta et al. Architectures for speech-to-speech translation using finite-state models
CN109145289A (en) Based on the old-Chinese bilingual sentence similarity calculating method for improving relation vector model
CN111274826B (en) Semantic information fusion-based low-frequency word translation method
Dologlou et al. Using monolingual corpora for statistical machine translation: the METIS system
Lee et al. Probabilistic modeling of Korean morphology
Zhang et al. Keyword-driven image captioning via Context-dependent Bilateral LSTM
Peter et al. The qt21/himl combined machine translation system
Harada et al. Neural machine translation with synchronous latent phrase structure
Velldal et al. Paraphrasing treebanks for stochastic realization ranking
Hu et al. An approach to automatic acquisition of translation templates based on phrase structure extraction and alignment
Satpathy et al. Analysis of Learning Approaches for Machine Translation Systems

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20190104