CN109145289A - Based on the old-Chinese bilingual sentence similarity calculating method for improving relation vector model - Google Patents
Based on the old-Chinese bilingual sentence similarity calculating method for improving relation vector model Download PDFInfo
- Publication number
- CN109145289A CN109145289A CN201810808788.7A CN201810808788A CN109145289A CN 109145289 A CN109145289 A CN 109145289A CN 201810808788 A CN201810808788 A CN 201810808788A CN 109145289 A CN109145289 A CN 109145289A
- Authority
- CN
- China
- Prior art keywords
- sentence
- keyword
- chinese
- vector
- laotian
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 21
- 239000013598 vector Substances 0.000 title claims description 92
- 238000004364 calculation method Methods 0.000 claims abstract description 12
- 238000005192 partition Methods 0.000 claims description 6
- 238000012545 processing Methods 0.000 claims description 5
- 238000003058 natural language processing Methods 0.000 abstract description 3
- 238000011160 research Methods 0.000 abstract description 3
- 238000010801 machine learning Methods 0.000 abstract description 2
- 230000008676 import Effects 0.000 description 3
- 238000011161 development Methods 0.000 description 2
- 230000018109 developmental process Effects 0.000 description 2
- 238000012821 model calculation Methods 0.000 description 2
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000002474 experimental method Methods 0.000 description 1
- 230000011218 segmentation Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/279—Recognition of textual entities
- G06F40/289—Phrasal analysis, e.g. finite state techniques or chunking
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/10—Text processing
- G06F40/194—Calculation of difference between files
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/205—Parsing
- G06F40/211—Syntactic parsing, e.g. based on context-free grammar [CFG] or unification grammars
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/237—Lexical tools
- G06F40/247—Thesauruses; Synonyms
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/253—Grammatical analysis; Style critique
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/30—Semantic analysis
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Health & Medical Sciences (AREA)
- Artificial Intelligence (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Computational Linguistics (AREA)
- General Health & Medical Sciences (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The present invention relates to a kind of combination similarity and scheme matched old-Chinese bilingual sentence alignment schemes, belongs to natural language processing and machine learning techniques field.Old-Chinese bilingual dictionary that the present invention is first depending on building calculates the similarity value of Laotian and Chinese sentence, then bilingual sentence length information is fully considered, calculate Laotian and Chinese sentence length ratio value, comprehensive two values calculate Laotian and Chinese sentence similarity value, so that old-Chinese bilingual sentence similarity calculation reliability with higher, the higher Laotian of similarity and Chinese sentence can be thus aligned in alignment procedure, simplify the process of sentence alignment.The present invention can effectively excavate parallel sentence pairs from bilingualism corpora, and old-calculating of Chinese bilingual sentence similarity and the best match algorithm of bigraph (bipartite graph) sufficiently combine, and can effectively improve the accuracy rate of sentence alignment, therefore the present invention has certain research significance.
Description
Technical field
The present invention relates to a kind of based on the old-Chinese bilingual sentence similarity calculating method for improving relation vector model, belongs to
Natural language processing and machine learning techniques field.
Background technique
Sentence similarity calculating is research topic important in natural language processing field, is widely used.In question and answer
In system, needs to ask a question to user using similarity based method and the problems in system knowledge base is compared, find problem
Best match and return to optimum answer.In the generating process of automatic abstract, need to use the method for sentence similarity to arrange
Except the sentence of similar import, the redundancy of digest is avoided.In terms of across language, the old bilingual sentence similarity calculation of the Chinese can be applied
The search of the old hot news of the Chinese, the old shared education resources of the Chinese, and promote the development of the Chinese old cultural exchanges and both sides in all respects.
Summary of the invention
The technical problem to be solved in the present invention is to provide a kind of based on the old-Chinese bilingual sentence phase for improving relation vector model
Like degree calculation method, the accuracy rate of old-Chinese bilingual sentence similarity calculation can be effectively improved, it on the other hand also can be to Laos
Language corpus is expanded, therefore the present invention has certain research significance.
The technical solution adopted by the present invention is that: it is a kind of based on the old-Chinese bilingual sentence similarity for improving relation vector model
Calculation method, characterized by the following steps:
Step1, first to Chinese sentence T in corpusiWith Laotian sentence TjParticiple and part-of-speech tagging are carried out, is therefrom screened
The keyword of Chinese sentence and Laotian sentence out;
Step1.1, first with Words partition system T to Chinese sentence respectivelyiWith Laotian sentence TjIt is segmented, is obtained
Chinese and Laotian sentence after participle;
Step1.2, by participle after, carry out part-of-speech tagging, therefrom filter out the main component of a sentence, they include
Noun, pronoun, verb, adjective and adverbial word these types part of speech, it is crucial accordingly as Chinese sentence and Laotian sentence
Word does so the semantic integrity that can guarantee sentence to the utmost;
Step2, the Chinese sentence T for obtaining Step1iWith Laotian sentence TjKeyword be converted to third party's language English
Language constitutes TiAnd TjKeyword vector indicate;
Step2.1, it defines 1: the definition that keyword vector indicates: such as giving a Chinese sentence Ti, by Words partition system
After participle, obtained keyword miThe vector of composition is known as Chinese sentence TiKeyword vector indicate, be Tiv={ m1,
m2,…,mn};
Step3, Chinese sentence T is constitutediWith Laotian sentence TjKeyword vector expression after, then consider vector length
Shorter crucial term vector, it is assumed here that Len (Ti)≤Len(Tj), i.e. hypothesis Chinese sentence vector length is shorter than Laotian sentence
Vector length calculates Chinese sentence T at this timeiInitial weight value vector T Bi={ b1,b2,…,bn, for Chinese sentence TiIn
Each keyword mi, calculate old-Chinese bilingual sentence similarity value;
Step3.1, due to having here related to Chinese sentence TiWith Laotian sentence TjKeyword indicate and weighted value to
Amount, so being illustrated here using definition 2, definition 3, definition 4: defining 2: giving a Chinese sentence TiKeyword to
Amount indicates Tiv={ m1,m2,…,mn, the keyword m in vectoriPrevious keyword mi-1Referred to as miPreceding keyword, mi's
The latter keyword mi+1Referred to as miRear keyword;It defines 3: giving a Chinese sentence TiKeyword vector indicate
Tiv={ m1,m2,…,mn, TiVector length Len (Ti)=n gives each keyword miAssign an initial weight valueThe weighted value of all keywords constitutes a vector and is known as TiInitial weight value vector, be expressed as TBi
={ b1,b2,…,bn};It defines 4: giving two Chinese sentence TiWith Laotian sentence TjKeyword vector indicate, for Tiv
In any keyword miIf miAlso in TjMiddle appearance, then claim miIn TjMiddle presence, TiIn it is all in TjPresent in keyword
The vector of composition is known as TiBased on TjThere are vectors, be expressed as Ei,j={ e1,e2,…,ep, there are keywords corresponding in vector
Weighted value constitute vector be known as TiBased on TjExistence value vector, be expressed as TEi,j={ v1,v2,…,vp, then respectively into
Row Step3.2 and Step3.3;
Step3.2, third party's language precision is improved by increaseing accordingly the weight that keyword is near synonym, then carried out
Step3.4;
Step3.3, the precision of keyword position is improved by increasing the judgement number of preceding keyword and rear keyword, so
After carry out Step3.4;
Step3.4, basis obtain Chinese sentence TiInitial weight value vector T Bi={ b1,b2,…,bn, Chinese sentence Ti
Based on Laotian sentence TjExistence value vector T Ei,j={ v1,v2,…,vp, therefore, old-Chinese bilingual sentence similarity value calculates
Shown in formula such as formula (1):
Specifically, specific step is as follows by the Step3.2;
Step3.2.1, assume Len (Ti)≤Len(Tj), calculate TiInitial weight value vector T Bi={ b1,b2,…,
bn};
Step3.2.2, for Chinese sentence TiEach of keyword miIf miIn Laotian sentence TjMiddle presence
Or with the presence of synonym, consider miIn TiAnd TjIn preceding keyword, if the two preceding keywords are identical word or same
Adopted word, then by TBiMiddle miCorresponding weight increases α times, if the two preceding keywords are near synonym, by TBiMiddle miAccordingly
Weight increases β (1 < β < α) times, for miRear keyword do identical processing, E may finally be obtainedi,j={ e1,
e2,...,epAnd TEi,j={ v1,v2,...,vp}。
Specifically, specific step is as follows by the Step3.3;
Step3.3.1, assume Len (Ti)≤Len(Tj), calculate Chinese sentence TiInitial weight value vector T Bi={ b1,
b2,…,bn};
Step3.3.2, for TiEach of keyword miIf: miIn Laotian sentence TjMiddle presence has same
Adopted word exists, and considers miIn TiAnd TjIn beforeA keyword, wherein γ is rounded downwards, and γ is TjThe number of keyword, such as
Before fruitA keyword is identical word or synonym, then by TBiMiddle miCorresponding weight increases α times, if precedingA key
Word is near synonym, then by TBiMiddle miCorresponding weight increases β (1 < β < α) times, for miAfterA keyword does identical place
Reason, finally obtains Ei,j={ e1,e2,...,epAnd TEi,j={ v1,v2,...,vp}。
The beneficial effects of the present invention are:
1. old-Chinese bilingual sentence similarity calculating method of the invention based on improved relation vector model, proposes
A kind of relationship considering bilingual sentence structurally and semantically information simultaneously on the basis of vector space model using third party's language
Vector model, improves efficiently traditional vector space model, improves old-Chinese bilingual sentence similarity to a certain extent
The accuracy rate of calculating.
2. old-Chinese bilingual sentence similarity calculating method of the invention based on improved relation vector model, this mould
Type considers the synonymous information of Matching Relation and keyword between the keyword of composition sentence, in third party's language and keyword
It all increases in the precision of position, the structurally and semantically information of sentence can be embodied well, improve old-Chinese bilingual sentence phase
The accuracy calculated like degree.
3. old-Chinese bilingual sentence similarity calculating method of the invention based on improved relation vector model, realization
Calculation method across language sentence similarity can apply the search in the old hot news of the Chinese, search two marks of similar import
Topic generates the sentence for excluding similar import when the autoabstract of the old network hotspot news of the Chinese, avoids the redundancy and rush of digest sentence
Into the development of the various Chinese old cultural exchanges and both sides.
Detailed description of the invention
Fig. 1 is the overview flow chart in the present invention.
Fig. 2 is that third party's language precision improves in the present invention.
Fig. 3 is that keyword position precision improves in the present invention.
Specific embodiment
In order to describe in more detail the present invention and convenient for the understanding of those skilled in the art, with reference to the accompanying drawing and embodiment pair
The present invention is further described, and the embodiment of this part for illustrating the present invention, do not come with this by the purpose being easy to understand
The limitation present invention.
Embodiment 1: as shown in Figure 1-3, based on a kind of old-Chinese bilingual sentence similarity by improved relation vector model
Calculation method, includes the following steps:
Step1, participle and part-of-speech tagging are carried out to Chinese sentence in corpus and Laotian sentence first, is screened out from it the Chinese
The keyword of sentence and Laotian sentence;
Step1.1, first with Words partition system respectively to Chinese sentence TiWith Laotian sentence TjIt is segmented, is divided
Chinese and Laotian sentence after word.
Step1.2, by participle after, carry out part-of-speech tagging, therefrom filter out the main component of a sentence, they include
Noun, pronoun, verb, adjective and adverbial word these types part of speech, it is crucial accordingly as Chinese sentence and Laotian sentence
Word does so the semantic integrity that can guarantee sentence to the utmost.
Step2, from the word segmentation result of Step1, extract Chinese sentence TiWith Laotian sentence TjCorresponding keyword simultaneously will
These keywords are converted to third party's language English, constitute TiAnd TjKeyword vector indicate.
Step2.1, due to having here related to Chinese sentence TiWith Laotian sentence TjCrucial term vector, so herein
Place is illustrated using definition 1: being defined 1: being given a Chinese sentence Ti, after Words partition system segments, obtained key
Word miThe vector of composition is known as Chinese sentence TiKeyword vector indicate, be Tiv={ m1,m2,…,mn}。
Step3, Chinese sentence T is constitutediWith Laotian sentence TjCrucial term vector after, then consider vector length it is shorter
Crucial term vector, it is assumed here that Len (Ti)≤Len(Tj) (i.e. hypothesis Chinese sentence vector length is shorter than Laotian sentence vector
Length), Chinese sentence T is calculated at this timeiInitial weight value vector T Bi={ b1,b2,…,bn}.For Chinese sentence TiIn it is every
One keyword mi, some processing are taken turns doing to calculate old-Chinese bilingual sentence similarity value, can pass through Figure of description herein
2 and attached drawing 3 help to understand old-Chinese bilingual sentence similarity calculation proposed by the present invention based on improved relation vector model
The improvement of method.Relation vector model not only considers whether the keyword in a sentence occurs in another sentence, it is also contemplated that
With the influence of most close two words (preceding keyword and rear keyword) of this keyword, in this way, in sentence between all keywords
Structural relation embodied, thus increase the comprehensive and accuracy of analysis.The present invention is exactly to carry out to this model
Some improvement, to improve the accuracy rate of old-Chinese bilingual sentence similarity calculation.
Step3.1, due to having here related to Chinese sentence TiWith Laotian sentence TjKeyword indicate and weighted value to
Amount, so being illustrated here using definition 2, definition 3, definition 4: defining 2: giving a Chinese sentence TiKeyword to
Amount indicates Tiv={ m1,m2,…,mn, the keyword m in vectoriPrevious keyword mi-1Referred to as miPreceding keyword, mi's
The latter keyword mi+1Referred to as miRear keyword.It defines 3: giving a Chinese sentence TiKeyword vector indicate Tiv=
{m1,m2,…,mn, TiVector length Len (Ti)=n gives each keyword miAssign an initial weight valueThe weighted value of all keywords constitutes a vector and is known as TiInitial weight value vector, be expressed as TBi
={ b1,b2,…,bn}.It defines 4: giving two Chinese sentence TiWith Laotian sentence TjKeyword vector indicate, for Tiv
In any keyword miIf miAlso in TjMiddle appearance, then claim miIn TjMiddle presence, TiIn it is all in TjPresent in keyword
The vector of composition is known as TiBased on TjThere are vectors, be expressed as Ei,j={ e1,e2,…,ep}.There are keywords corresponding in vector
Weighted value constitute vector be known as TiBased on TjExistence value vector, be expressed as TEi,j={ v1,v2,…,vp}。
Step3.2, sentence similarity is calculated using keyword to be converted to the method for third party's language due to the invention,
It is wherein just inevitably influenced by third party's language, the influence of near synonym is encountered during especially converting.Therefore,
Need to improve the precision of third party's language, the present invention is realized by increaseing accordingly the weight that keyword is near synonym.Herein may be used
To help the raising understood Ben Faming in third party's language precision by Figure of description 2.
Step3.2.1, assume Len (Ti)≤Len(Tj), calculate TiInitial weight value vector T Bi={ b1,b2,…,
bn}。
Step3.2.2, for Chinese sentence TiEach of keyword miIf miIn Laotian sentence TjMiddle presence
Or with the presence of synonym, consider miIn TiAnd TjIn preceding keyword, if the two preceding keywords are identical word or same
Adopted word, then by TBiMiddle miCorresponding weight increases α times, if the two preceding keywords are near synonym, by TBiMiddle miAccordingly
Weight increases β (1 < β < α) times, for miRear keyword do identical processing, finally obtain Ei,j={ e1,e2,...,
epAnd TEi,j={ v1,v2,...,vp}
Step3.3, similar, main subject+predicate+object structure is constituted due to Chinese and the sentence of Laotian
It is similar, but have some subtle differences, these differences result in keyword position and can deviate, that is, previous pass
Keyword and the latter keyword cannot determine that can a keyword increase weight completely, therefore will cause the position due to keyword
Precision caused by setting is lost.Therefore, the present invention improves keyword by increasing the judgement number of preceding keyword and rear keyword
The precision of position.The raising understood Ben Faming in keyword position precision can be helped by Figure of description 3 herein.
Step3.3.1, assume Len (Ti)≤Len(Tj), calculate Chinese sentence TiInitial weight value vector T Bi={ b1,
b2,…,bn}。
Step3.3.2, for TiEach of keyword miIf: miIn Laotian sentence TjMiddle presence has same
Adopted word exists, and considers miIn TiAnd TjIn beforeA keyword, wherein γ is rounded downwards, and γ is TjThe number of keyword.Such as
Before fruitA keyword is identical word or synonym, then by TBiMiddle miCorresponding weight increases α times, if precedingA key
Word is near synonym, then by TBiMiddle miCorresponding weight increases β (1 < β < α) times, for miAfterA keyword does identical place
Reason, finally obtains Ei,j={ e1,e2,...,epAnd TEi,j={ v1,v2,...,vp}。
γ=Len (T can be found during specific experimenti) influence whether last similarity accuracy rate, that is,
Consider front and backError can be generated when (being rounded downwards) a keyword.There are two types of situations to occur: the first situation: working as key
When word number is less, the latter keyword does not have much affect to accuracy rate before only considering, moreover it is possible to the accuracy calculated is kept, but
It is after keyword number increase, error caused by the grammatical differences between Chinese and Laotian also just increases, and preceding the latter is closed
Keyword cannot be guaranteed the accuracy rate calculated, therefore accuracy rate declines;Second situation: right when keyword number is less
Accuracy rate does not have much affect, but when keyword number increases, considers front and backThe keyword that will lead to is repeated meter
It calculates, therefore it is higher to will lead to accuracy rate.Therefore, it is found after comprehensive analysis, when keyword number is between 5 to 7, old-Chinese
Bilingual sentence similarity calculation is more accurate.
The present invention can be successfully solved in the case where Laotian corpus is less, and Chinese and Laotian is effectively performed
Bilingual sentence similarity calculation, on the other hand Laotian corpus can also be expanded, therefore the present invention has certain grind
Study carefully meaning.
In conjunction with attached drawing, the embodiment of the present invention is explained in detail above, but the present invention is not limited to above-mentioned
Embodiment within the knowledge of a person skilled in the art can also be before not departing from present inventive concept
It puts and makes a variety of changes.
Claims (3)
1. a kind of based on the old-Chinese bilingual sentence similarity calculating method for improving relation vector model, it is characterised in that: including such as
Lower step:
Step1, first to Chinese sentence T in corpusiWith Laotian sentence TjParticiple and part-of-speech tagging are carried out, the Chinese is screened out from it
The keyword of sentence and Laotian sentence;
Step1.1, first with Words partition system T to Chinese sentence respectivelyiWith Laotian sentence TjIt is segmented, is segmented
Chinese and Laotian sentence afterwards;
Step1.2, by participle after, carry out part-of-speech tagging, therefrom filter out the main component of a sentence, they include name
Word, pronoun, verb, adjective and adverbial word these types part of speech, using it as Chinese sentence and the corresponding keyword of Laotian sentence,
Do so the semantic integrity that can guarantee sentence to the utmost;
Step2, the Chinese sentence T for obtaining Step1iWith Laotian sentence TjKeyword be converted to third party's language English, structure
At TiAnd TjKeyword vector indicate;
Step2.1, it defines 1: the definition that keyword vector indicates: such as giving a Chinese sentence Ti, segmented by Words partition system
Afterwards, obtained keyword miThe vector of composition is known as Chinese sentence TiKeyword vector indicate, be Tiv={ m1,m2,…,
mn}。
Step3, Chinese sentence T is constitutediWith Laotian sentence TjKeyword vector expression after, then consider vector length it is shorter
Crucial term vector, it is assumed here that Len (Ti)≤Len(Tj), i.e. hypothesis Chinese sentence vector length is shorter than Laotian sentence vector
Length calculates Chinese sentence T at this timeiInitial weight value vector T Bi={ b1,b2,…,bn, for Chinese sentence TiIn it is every
One keyword mi, calculate old-Chinese bilingual sentence similarity value;
Step3.1, due to having here related to Chinese sentence TiWith Laotian sentence TjKeyword indicate and weighted value vector, institute
To be illustrated here using definition 2, definition 3, definition 4: defining 2: giving a Chinese sentence TiKeyword vector indicate
Tiv={ m1,m2,…,mn, the keyword m in vectoriPrevious keyword mi-1Referred to as miPreceding keyword, miThe latter close
Keyword mi+1Referred to as miRear keyword;It defines 3: giving a Chinese sentence TiKeyword vector indicate Tiv={ m1,m2,…,
mn, TiVector length Len (Ti)=n gives each keyword miAssign an initial weight valueIt is all
The weighted value of keyword constitutes a vector and is known as TiInitial weight value vector, be expressed as TBi={ b1,b2,…,bn};Definition
4: giving two Chinese sentence TiWith Laotian sentence TjKeyword vector indicate, for TivIn any keyword miIf
miAlso in TjMiddle appearance, then claim miIn TjMiddle presence, TiIn it is all in TjPresent in keyword constitute vector be known as TiBased on Tj
There are vectors, be expressed as Ei,j={ e1,e2,…,ep, the vector constituted there are the weighted value of keyword corresponding in vector is known as
TiBased on TjExistence value vector, be expressed as TEi,j={ v1,v2,…,vp, Step3.2 and Step3.3 is then carried out respectively;
Step3.2, third party's language precision is improved by increaseing accordingly the weight that keyword is near synonym, then carried out
Step3.4;
Step3.3, improve the precision of keyword position by increasing the judgement number of preceding keyword and rear keyword, then into
Row Step3.4;
Step3.4, basis obtain Chinese sentence TiInitial weight value vector T Bi={ b1,b2,…,bn, Chinese sentence TiIt is based on
Laotian sentence TjExistence value vector T Ei,j={ v1,v2,…,vp, therefore, old-Chinese bilingual sentence similarity value calculation formula
As shown in formula (1):
2. according to claim 1 a kind of based on the old-Chinese bilingual sentence similarity calculation side for improving relation vector model
Method, it is characterised in that: specific step is as follows by the Step3.2;
Step3.2.1, assume Len (Ti)≤Len(Tj), calculate TiInitial weight value vector T Bi={ b1,b2,…,bn};
Step3.2.2, for Chinese sentence TiEach of keyword miIf miIn Laotian sentence TjMiddle presence has
Synonym exists, and considers miIn TiAnd TjIn preceding keyword, if the two preceding keywords be identical word or synonym,
Then by TBiMiddle miCorresponding weight increases α times, if the two preceding keywords are near synonym, by TBiMiddle miCorresponding weight
Increase β (1 < β < α) times, for miRear keyword do identical processing, E may finally be obtainedi,j={ e1,e2,...,
epAnd TEi,j={ v1,v2,...,vp}。
3. according to claim 1 a kind of based on the old-Chinese bilingual sentence similarity calculation side for improving relation vector model
Method, it is characterised in that: specific step is as follows by the Step3.3;
Step3.3.1, assume Len (Ti)≤Len(Tj), calculate Chinese sentence TiInitial weight value vector T Bi={ b1,
b2,…,bn};
Step3.3.2, for TiEach of keyword miIf: miIn Laotian sentence TjMiddle presence has synonym to deposit
Considering miIn TiAnd TjIn beforeA keyword, wherein γ is rounded downwards, and γ is TjThe number of keyword, if preceding
A keyword is identical word or synonym, then by TBiMiddle miCorresponding weight increases α times, if precedingA keyword is close
Adopted word, then by TBiMiddle miCorresponding weight increases β (1 < β < α) times, for miAfterA keyword does identical processing, most
E is obtained eventuallyi,j={ e1,e2,...,epAnd TEi,j={ v1,v2,...,vp}。
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810808788.7A CN109145289A (en) | 2018-07-19 | 2018-07-19 | Based on the old-Chinese bilingual sentence similarity calculating method for improving relation vector model |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810808788.7A CN109145289A (en) | 2018-07-19 | 2018-07-19 | Based on the old-Chinese bilingual sentence similarity calculating method for improving relation vector model |
Publications (1)
Publication Number | Publication Date |
---|---|
CN109145289A true CN109145289A (en) | 2019-01-04 |
Family
ID=64801258
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201810808788.7A Pending CN109145289A (en) | 2018-07-19 | 2018-07-19 | Based on the old-Chinese bilingual sentence similarity calculating method for improving relation vector model |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN109145289A (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112257453A (en) * | 2020-09-23 | 2021-01-22 | 昆明理工大学 | Chinese-Yue text similarity calculation method fusing keywords and semantic features |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20110106805A1 (en) * | 2009-10-30 | 2011-05-05 | International Business Machines Corporation | Method and system for searching multilingual documents |
CN102360372A (en) * | 2011-10-09 | 2012-02-22 | 北京航空航天大学 | Cross-language document similarity detection method |
CN103034627A (en) * | 2011-10-09 | 2013-04-10 | 北京百度网讯科技有限公司 | Method and device for calculating sentence similarity and method and device for machine translation |
CN105824797A (en) * | 2015-01-04 | 2016-08-03 | 华为技术有限公司 | Method, device and system evaluating semantic similarity |
-
2018
- 2018-07-19 CN CN201810808788.7A patent/CN109145289A/en active Pending
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20110106805A1 (en) * | 2009-10-30 | 2011-05-05 | International Business Machines Corporation | Method and system for searching multilingual documents |
CN102360372A (en) * | 2011-10-09 | 2012-02-22 | 北京航空航天大学 | Cross-language document similarity detection method |
CN103034627A (en) * | 2011-10-09 | 2013-04-10 | 北京百度网讯科技有限公司 | Method and device for calculating sentence similarity and method and device for machine translation |
CN105824797A (en) * | 2015-01-04 | 2016-08-03 | 华为技术有限公司 | Method, device and system evaluating semantic similarity |
Non-Patent Citations (1)
Title |
---|
殷耀明 等: "基于关系向量模型的句子相似度计算", 《计算机工程与应用》 * |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112257453A (en) * | 2020-09-23 | 2021-01-22 | 昆明理工大学 | Chinese-Yue text similarity calculation method fusing keywords and semantic features |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN106202042B (en) | A kind of keyword abstraction method based on figure | |
Bod | An all-subtrees approach to unsupervised parsing | |
CN109325229B (en) | Method for calculating text similarity by utilizing semantic information | |
Banerjee et al. | Meaningless yet meaningful: Morphology grounded subword-level NMT | |
CN105068997B (en) | The construction method and device of parallel corpora | |
CN110378409A (en) | It is a kind of based on element association attention mechanism the Chinese get over news documents abstraction generating method | |
Costa-Jussá et al. | Statistical machine translation enhancements through linguistic levels: A survey | |
Hasan et al. | Neural clinical paraphrase generation with attention | |
CN107092605A (en) | A kind of entity link method and device | |
CN113901208A (en) | Method for analyzing emotion tendentiousness of intermediate-crossing language comments blended with theme characteristics | |
Das et al. | A survey of the model transfer approaches to cross-lingual dependency parsing | |
Liu et al. | Language model augmented relevance score | |
Wang et al. | Hierarchical phrase-based sequence-to-sequence learning | |
Zhu et al. | Concept transfer learning for adaptive language understanding | |
Casacuberta et al. | Architectures for speech-to-speech translation using finite-state models | |
CN109145289A (en) | Based on the old-Chinese bilingual sentence similarity calculating method for improving relation vector model | |
CN111274826B (en) | Semantic information fusion-based low-frequency word translation method | |
Dologlou et al. | Using monolingual corpora for statistical machine translation: the METIS system | |
Lee et al. | Probabilistic modeling of Korean morphology | |
Zhang et al. | Keyword-driven image captioning via Context-dependent Bilateral LSTM | |
Peter et al. | The qt21/himl combined machine translation system | |
Harada et al. | Neural machine translation with synchronous latent phrase structure | |
Velldal et al. | Paraphrasing treebanks for stochastic realization ranking | |
Hu et al. | An approach to automatic acquisition of translation templates based on phrase structure extraction and alignment | |
Satpathy et al. | Analysis of Learning Approaches for Machine Translation Systems |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20190104 |