CN107609095B - Based on across the language inquiry extended method for weighting positive and negative regular former piece and relevant feedback - Google Patents
Based on across the language inquiry extended method for weighting positive and negative regular former piece and relevant feedback Download PDFInfo
- Publication number
- CN107609095B CN107609095B CN201710807540.4A CN201710807540A CN107609095B CN 107609095 B CN107609095 B CN 107609095B CN 201710807540 A CN201710807540 A CN 201710807540A CN 107609095 B CN107609095 B CN 107609095B
- Authority
- CN
- China
- Prior art keywords
- negative
- weighted
- item
- query
- positive
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Expired - Fee Related
Links
- 238000000034 method Methods 0.000 title claims abstract description 60
- 238000013519 translation Methods 0.000 claims abstract description 26
- 238000005065 mining Methods 0.000 claims description 42
- 238000004364 calculation method Methods 0.000 claims description 25
- 238000007689 inspection Methods 0.000 claims description 3
- 238000007781 pre-processing Methods 0.000 claims description 3
- 238000001514 detection method Methods 0.000 claims description 2
- 238000005516 engineering process Methods 0.000 abstract description 9
- 238000009412 basement excavation Methods 0.000 abstract 1
- 238000010276 construction Methods 0.000 abstract 1
- 238000002474 experimental method Methods 0.000 description 9
- 238000007430 reference method Methods 0.000 description 9
- 230000006872 improvement Effects 0.000 description 6
- 238000011156 evaluation Methods 0.000 description 5
- 238000011160 research Methods 0.000 description 5
- 238000012360 testing method Methods 0.000 description 3
- 238000010586 diagram Methods 0.000 description 2
- 230000008569 process Effects 0.000 description 2
- 238000012545 processing Methods 0.000 description 2
- 230000011218 segmentation Effects 0.000 description 2
- 238000004458 analytical method Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000000052 comparative effect Effects 0.000 description 1
- 150000001875 compounds Chemical class 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 230000018109 developmental process Effects 0.000 description 1
- 230000002708 enhancing effect Effects 0.000 description 1
- 238000013401 experimental design Methods 0.000 description 1
- 239000000284 extract Substances 0.000 description 1
- 230000006870 function Effects 0.000 description 1
- 230000007774 longterm Effects 0.000 description 1
- 230000007246 mechanism Effects 0.000 description 1
- 235000020004 porter Nutrition 0.000 description 1
- 210000002307 prostate Anatomy 0.000 description 1
- 230000009467 reduction Effects 0.000 description 1
Landscapes
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
- Machine Translation (AREA)
Abstract
A kind of across language inquiry extended method based on weighting positive and negative regular former piece and relevant feedback, source language query is first translated as object language using translation tool to inquire, target document is retrieved to obtain initial survey document, forefront initial survey document is extracted and constructs object language initial survey set of relevant documents after End-user relevance judges;Positive and negative association rule model, the positive and negative correlation rule library of construction feature word are weighted containing the Feature Words of inquiry lexical item to the excavation of initial survey set of relevant documents using towards the positive and negative association mode digging technology of weighting extended across language inquiry again;It is the positive and negative association rule model of weighting for inquiring lexical item that its consequent is extracted from rule base, using positive association rules former piece Feature Words as positive expansion word, negative customers rule former piece is removed in positive expansion word and obtains final former piece expansion word after negative expansion word and realize to translate rear former piece extension across language inquiry as negative expansion word.The present invention can improve and improve cross-language information retrieval performance, there is preferable application value and promotion prospect.
Description
Technical Field
The invention belongs to the field of internet information retrieval, in particular to a cross-language query expansion method based on weighted positive and negative rule front pieces and related feedback, which is suitable for the field of cross-language information retrieval.
Background
Cross-Language Information Retrieval (CLIR) refers to a technique for retrieving Information resources in other languages in the form of a query in one Language, called source Language (SourceLanguage), expressing a user query, and a Language in which a document to be retrieved is called Target Language (Target Language). The cross-language query expansion technology is one of core technologies capable of improving and enhancing cross-language retrieval performance, and aims to solve the problems of long-term puzzlement, serious query topic drift, word mismatching and the like in the cross-language information retrieval field. The cross-language query expansion is divided into a pre-translation query expansion, a post-translation query expansion and a mixed query expansion (namely, the query expansion simultaneously occurs before translation and after translation) according to the fact that the expansion occurs at different stages of a retrieval process. With the rise of cross-language information retrieval research, cross-language query expansion is more and more concerned and discussed by scholars at home and abroad, and becomes a research hotspot.
The cross-language information retrieval is a technology combining information retrieval and machine translation, is more complex than single-language retrieval, and has more serious problems than the single-language retrieval. These problems have been the bottleneck restricting the development of cross-language information retrieval technology, and are also the common problems in the cross-language information retrieval which are urgently needed to be solved internationally, and mainly appear as follows: query topic gross drift, word mismatch, and query term translation ambiguities and ambiguities, among others. Cross-language query expansion is one of the core technologies to solve the above problems. In the last 10 years, the cross-language query expansion model and algorithm are widely concerned and deeply researched, and rich theoretical achievements are obtained, but the problems are not finally and completely solved.
Disclosure of Invention
The invention applies the mining of the weighted positive and negative association mode to the expansion after the translation of cross-language query, provides a cross-language query expansion method based on the weighted positive and negative rule front piece and the relevant feedback, is applied to the field of cross-language information retrieval, can solve the problems of query subject drift and word mismatching existing in the cross-language information retrieval for a long time, improves the performance of the cross-language information retrieval, can also be applied to a cross-language search engine, and improves the retrieval performances of the search engine, such as recall rate, precision rate and the like.
The technical scheme adopted by the invention is as follows:
1. a cross-language query expansion method based on weighted positive and negative rule antecedents and related feedback is characterized by comprising the following steps:
1.1 translating a source language query into a target language query using a machine translation system;
1.2, the target language queries and retrieves the original document set of the target language to obtain a target language initial check document;
1.3, constructing a target language initial examination related document set: the method comprises the steps that firstly, a user relevance judgment is carried out on a front-line n-space target language primary examination document to obtain a primary examination related document, and therefore a target language primary examination related document set is constructed;
1.4 mining a weighted frequent item set and a negative item set containing original query terms for a target language initial examination related document set;
the method comprises the following specific steps:
1.4.1, preprocessing a relevant document set of the initial inspection of the target language, and constructing a document index library and a total feature word library;
1.4.2 mining frequent 1_ item set L1:
Namely, obtaining the 1_ item set C of the characteristic word candidate from the total characteristic word library1Calculate 1_ item set C1Support of (a) awSup (C)1) If awSu (C)1) The support degree threshold value ms is more than or equal to, the candidate 1_ item set C1For frequent 1_ item set L1And mixing L1Adding to a weighted frequent item set PIS; the awSup (C)1) The calculation formula is shown as formula (1):
wherein n and W are the total length of the documents in the relevant document set of the initial examination of the target language and the sum of the weights of all the characteristic words respectively,is C1The frequency of occurrence in the target language initial examination related document set,is C1The item set weight in the target language initial detection relevant document set, β ∈ (0,1) is an adjustment coefficient, and the value cannot be 0 or 1;
1.4.3 mining a frequent k _ term set L containing query termskAnd negative k _ term set NkK is more than or equal to 2
The method comprises the following specific steps:
(1) mining candidate k _ term set Ck: by frequent (k-1) _ sets of items Lk-1Obtained by carrying out Aproiri ligation;
(2) when k is 2, prune the candidate 2_ term set C without query terms2Keeping candidate 2_ term set C containing query terms2;
(3) Computing a set of candidate k _ terms CkSupport of (a) awSup (C)k):
If awSu (C)k) Not less than the support threshold ms, and then C is calculatedkWeighted frequent item set relevance of (C) awPIRk) If awPIR (C)k) The relevance threshold value minPR of the frequent item set is more than or equal to, the k _ candidate item set CkAs a weighted frequent k _ term set LkAdding the weighted frequent item set PIS;
if awSu (C)k)<ms, then calculate a weighted negative term set relevance awNIR (C)k) If awNIR (C)k) A threshold value minNR for relevancy of a set of ≧ negative terms, then CkIs a weighted negative k _ term set NkAnd added to the weighted negative set NIS; the awSup (C)k) The calculation formula is shown in formula (2):
wherein,is CkThe frequency of occurrence in the target language initial examination related document set,is CkItem set weight in target language initial check related document set, k is CkThe number of items of (2);
awPIR(Ck) The calculation formula (c) is divided into two cases: m is 2 and m>2 case, i.e. formula(3) And a compound represented by the formula (4),
wherein, the candidate weighted positive term set Ck=(t1,t2,…,tm),m≥2,tmax(1. ltoreq. max. ltoreq.m) is CkOf all items of (1) the single item whose support is the greatest, IqIs CkAll the 2_ sub-item sets to (m-1) _ sub-item set with the largest supporting degree;
awNIR(Ck) The calculation formula (c) is divided into two cases: r is 2 and r>2, namely, as shown in the formulas (5) and (6),
wherein the candidate weighted negative term set Ck=(t1,t2,…,tr),r≥2,tmax(1. ltoreq. max. ltoreq. r) is CkOf all items of (1) the single item whose support is the greatest, IpIs CkAll the 2_ sub-item sets to (r-1) _ sub-item set with the largest supporting degree;
(4) if k _ entry set LkIf the item set is an empty set, ending the mining of the item set, and turning to the step 1.5, otherwise, turning to the step (1) and continuing the mining;
1.5 mining a weighted strong positive association rule from a weighted frequent item set PIS: for the featureEach frequent k _ term set L in the word-weighted frequent term set PISkK is more than or equal to 2, excavating LkThe front part is an association rule I → qt of an expansion term set I and the back part is a query term set qt, the union of qt and I is LkThe intersection of qt and I is an empty set, qt is a query term set, and I is an extended term set, and the specific mining steps are as follows:
(1) find the positive term set LkAll proper subsets of (A) to obtain LkA set of proper subset items;
(2) from LkArbitrarily take out two sub item sets qt and I in the proper subset set, anqt∪I=Lk,Wherein;
(3) calculating a weighted association rule I → qt confidence level awARConf (I → qt) and a lifting degree awARL (I → qt) thereof; if awARL (I → qt) >1 and awARConf (I → qt) > is the minimum weighted confidence threshold mc, obtaining a weighted strong association rule I → qt and adding the weighted strong association rule I → qt to the weighted strong positive association rule set PAR; the calculation formulas of awARConf (I → qt) and awARL (I → qt) are shown in formulas (7) and (8):
(4) returning to the step (2) and then sequentially carrying out until LkIf and only once for each proper subset in the proper subset item set, then re-fetch a new positive item set L from the PIS setkTurning to the step (1) to carry out a new round of weighted association rule mining step by step until each PISPersonal positive item set LkAll are taken out, and then the step 1.6 is carried out;
1.6 mining weighted strong negative association rules from the negative set of terms NIS: for each negative set N in the negative set NISk,k>Dig 2, NkThe front part being the query term set qt and the back part being the weighted negative association rule I → qt with q and I → qt with the negative extension term set I, the sum of qt and I being NkAnd the intersection of qt and I is an empty set, and the concrete mining steps are as follows:
(1) find out negative item set NkAll proper subsets of (A) to obtain NkA set of proper subsets;
(2) from NkArbitrarily take out two sub item sets qt and I in the proper subset set, anqt∪I=Nk,Wherein qt is the set of query terms;
(3) calculate the degree of lift awARL (I → qt), if awARL (I → qt) <1:
calculating a negative association rule I → qqt confidence level awARConf (I → qt), if awARConf (I → qt) > minimum weighted confidence threshold mc, resulting in a weighted strong negative association rule I → qt and adding to the set of weighted strong negative association rules NAR;
calculating negative association rule I → qt confidence aw arconf (| → qt), if aw arconf (| → qt) > < mc, then we get weighted strong negative association rule I → qt, and add to NAR; the calculations of awaronf (I → qt) and awaronf (I → qt) are as shown in equations (9) and (10):
awARConf(I→﹁qt)=1-awARCong(I→qt) (9)
(4) returning to the step (2) and then sequentially executing until NkIf and only if each proper subset in the proper subset set is taken out once, then step (5) is carried out;
(5) re-fetching a new negative set of terms N from the NIS setkTurning to the step (1) to carry out a new round of weighted negative association rule mining, if each negative item set in the NIS set is valid and is taken out only once, finishing the mining of the weighted strong negative association rule, and turning to the step 1.7;
1.7, extracting a weighted positive association rule mode I → qt of which the rule back part is a query term from a weighted strong positive association rule set PAR, and constructing a candidate front part expansion word bank by taking the characteristic words of the front part of the positive association rule as candidate expansion words;
1.8 extracting the weighted negative association rule pattern I → qt and I → qt whose back part of the rule is query term from the weighted strong negative association rule set NAR, constructing a front part negative extended word bank by using the front part I of the negative association rule as the front part negative extended word;
1.9 comparing each candidate front part expansion word in the candidate front part expansion word library with a negative expansion word of the front part negative expansion word library, deleting the candidate expansion words same as the negative expansion words in the candidate front part expansion word library, wherein the rest candidate front part expansion words in the candidate front part expansion word library are final front part expansion words;
2.0 the final combination of the front-piece expansion word and the target language original query word is searched again, and the front-piece expansion after cross-language query translation is realized.
The above strongly negative association rule of I → q and I → qt equi means negative associated symbols, "| means no occurrence of the set I in the target language first check related documents i.e. belonging to the negative related case.
"I → qt" means that the set of expanded terms I and the set of query terms qt exhibit a negative relevance, the occurrence of the set of expanded terms I in the target language first run relevant document set being such that the set of query terms qt does not occur.
I → qt "means that the set of expanded terms I and the set of query terms qt exhibit negative relevance, the absence of the set of expanded terms I in the target language first run related document set causing the set of query terms qt to appear.
The meaning of the weighted strong positive association rule I → qt is that the occurrence of the expansion term set I in the target language first run related document set causes the query term set qt to occur as well.
Compared with the prior art, the invention has the following beneficial effects:
(1) the invention provides a cross-language query expansion method based on weighted positive and negative rule front pieces and relevant feedback. The method adopts a positive and negative mode mining technology based on a weighted support degree-association degree-promotion degree-confidence degree evaluation framework to mine a weighted positive and negative association rule mode for a cross-language initial examination related document set, extracts a front piece of the weighted positive and negative association rule mode as a front piece expansion word related to an original query term to realize cross-language query translation front piece expansion, and enables cross-language information retrieval performance to be better promoted.
(2) The English text data set of cross-language information retrieval standard data testing corpus NTCIR-5CLIR on the multinational language processing international evaluation conference sponsored by the Japanese information research institute is selected as the experimental corpus of the invention, Vietnamese and English are taken as language objects, and the experiment of the method is carried out. The experimental comparison reference method is as follows: a Vietnamese-English Cross-Language Retrieval (VECLR) benchmark method and a pseudo-correlation Feedback Query Post-Translation extension (QPTE _ PRF) method Based on a pseudo-correlation Cross-Language Query extension [ J ] information bulletin, 2010,29(2): 232-. Experimental results show that compared with a reference method VECLR and a QPTE _ PRF, the R-Prec and P @5 values of the more English cross-language retrieval result of the TITLE query type of the method are greatly improved, the improvement amplitude of the method compared with the VECLR method can reach 91.28% to the maximum extent, and the improvement amplitude of the method compared with the QPTE _ PRF reference method can reach 265.88% to the maximum extent; the R-Prec and P @5 values of the DESC query type cross-English language retrieval result of the method are greatly improved compared with those of the reference methods VECLR and QPTE-PRF, and the maximum improvement amplitudes are 137.38% and 238.75% respectively.
(3) Experimental results show that the method is effective and can improve the cross-language information retrieval performance, and the main reasons are analyzed as follows: the invention discloses a cross-language information retrieval method, which solves the problems that the cross-language information retrieval is influenced by word mismatching and query translation quality, and serious initial check query theme drift is often caused.
Drawings
FIG. 1 is a block diagram of a cross-language query expansion method based on weighted positive and negative rule front-parts and related feedback according to the present invention.
FIG. 2 is a general flow diagram of the cross-language query expansion method based on weighted positive and negative rule antecedents and related feedback according to the present invention.
Detailed Description
In order to better illustrate the technical solution of the present invention, the following introduces the related concepts related to the present invention as follows:
1. cross-language query post-translation front-part extension
The cross-language query translated front-part extension refers to: in the cross-language query expansion, after an association rule mode obtained by mining relevant documents of target language initial examination is extracted, an association rule mode front piece relevant to target language original query is extracted as an expansion word, and the expansion word and a target language original query term are combined to form a new query.
2. Degree of weighting support
Let DS be { d }1,d2,…,dnIs a cross-language target language primary examination related Document Set (DS), where di(1. ltoreq. i. ltoreq. n) is the ith document in the document set DS, di={t1,t2,…,tm,…,tp},tm(m ═ 1,2, …, p) is document feature word item, short feature item, generally composed of word, word or phrase, diThe corresponding feature item weight set Wi={wi1,wi2,…,wim,…,wip},wimFor the ith document diM characteristic item tmCorresponding weight, TS ═ t1,t2,…,tkAnd expressing the whole feature item set in the DS, wherein each subset of the TS is called a feature item set, and is called an item set for short.
Aiming at the defects of the prior art, the invention gives a new method for calculating the weighted Support (All-weighted Support, awSup) awSup (I) by fully considering the frequency and the weight of the feature term item. The awSup (I) has a calculation formula shown in a formula (11).
Wherein, wISum of weights of sets of terms in cross-language target language primary examination related document set DS for weighted set of terms I, nIFor weighting the frequency of term set I appearing in cross-language target language primary examination related document set DS, n isThe method comprises the steps of obtaining a total document space, W being the sum of weights of all feature words in a cross-language target language primary examination related document set DS, k being the number of items (namely the length of the item set) in the item set I, β belonging to (0,1) being an adjusting coefficient, wherein the value of the adjusting coefficient cannot be 0 or 1, and the main function is to adjust the influence of the comprehensive frequency and weight of the items on weighting support.
Assuming that the minimum weighted support threshold is ms, if awSup (I)1∪I2) Greater than ms, then the set of weighted terms (I)1∪I2) Is a positive item set (i.e., a frequent item set), otherwise, (I)1∪I2) Is a negative term set.
The method only focuses on the following three types of weighted negative term sets: (I)1∪﹁I2) And (|)1∪I2) Giving a weighted negative set of support degrees awSup (|), awSup (I)1∪﹁I2) And aw & lt (|)1∪I2) The calculation formula is shown in formula (12) to formula (14).
awSup(ㄱI)=1-awSup(I) (12)
awSup(I1∪﹁I2)=awSup(I1)-awSup(I1∪I2) (13)
awSup(﹁I1∪I2)=awSup(I2)-awSup(I1∪I2) (14)
The method only focuses on the following two types of weighted negative association rules: (I)1→﹁I2) And (|)1→I2) Weighted positive-negative Association Rule Confidence (awARConf) awARConf (I)1→I2)、awARConf(I1→﹁I2) And awARConf (|)1→I2) The calculation formula (2) is shown in the formulas (15) to (17).
3. Weighted positive and negative term set relevancy
The weighted term set relevancy refers to a measure of the strength of the relevancy between any two individual terms in the weighted term set and between the sub-term sets. The higher the relevance of the item set, the more closely the relationship between the sub-item sets in the item set is, and the more attention is paid. The invention improves the existing relevance, provides a relevance calculation method of the weighted positive and negative term set, not only considers the relevance degree of any two single terms in the term set, but also considers the relevance between two sub term sets in the term set.
Weighted Positive item set relevance (All-weighted Positive Itemset Relevance, awPIR): positive term set C for weighted feature wordsk=(t1,t2,…,tm) M is a positive term set CkM is not less than 2, set tmax(1. ltoreq. max. ltoreq.m) is CkOf all items of (1) the single item whose support is the greatest, IqIs CkGiving a weighted positive item set relevance awPIR (C) for all the sub item sets from 2_ sub item set to (m-1) _ sub item set with the highest support degreek) The calculation formula (2) is shown in the formula (18) and the formula (19).
Wherein, the candidate weighted positive term set Ck=(t1,t2,…,tm),m≥2,tmax(1. ltoreq. max. ltoreq.m) is CkOf all items of (1) the single item whose support is the greatest, IqIs CkFrom all 2_ sub-item sets to (m-1) _ sub-item set with the most supported sub-item set.
Equations (18) and (19) indicate that the positive set of terms C is weightedkThe relevance is equal to the single item t with the maximum supportmaxAnd a sub-set of items Iq(i.e. I)qOne of the 2_ subentry through (m-1) _ subentry) is the sum of the conditional probabilities that the positive itemset occurs when it occurs, respectively.
Weighted Negative item set relevance (All-weighted Negative Itemset Relevance, awNIR): for weighted feature word negative term set Ck=(t1,t2,…,tr) R is a negative term set CkR is not less than 2, set tmax(1. ltoreq. max. ltoreq. r) is a negative term set CkOf all items of (1) the single item whose support is the greatest, IpAs a negative set of terms CkGiving a weighted negative term set relevance awNIR (C) to all the 2_ subentry sets to the (r-1) _ subentry set with the largest supportk) The calculation formula (2) is shown in the formula (20) and the formula (21).
Wherein the candidate weighted negative term set Ck=(t1,t2,…,tr),r≥2,tmax(1. ltoreq. max. ltoreq. r) is CkOf all items of (1) the single item whose support is the greatest, IpIs CkFrom all 2_ sub-item sets to the (r-1) _ sub-item set whose support is the greatest.
Equations (20) and (21) show that the weighted negative term set CkThe relevance is equal to the single item t with the maximum supportmaxAnd a sub-set of items Ip(i.e. I)pOne of the 2_ subentry through (r-1) _ subentry) conditional probabilities of the negative set of terms occurring when they do not occur, respectively.
Example (c): if Ck=(t1∪t2∪t3∪t4) (degree of support 0.65), its single entry t1,t2,t3And t4Are 0.82, 0.45, 0.76 and 0.75, respectively, their 2_ subentry set and 3_ subentry set (t)1∪t2),(t1∪t3),(t1∪t4),(t2∪t3),(t2∪t4),(t1∪t2∪t3),(t1∪t2∪t4),(t2∪t3∪t4) The support degrees are respectively 0.64,0.78,0.75,0.74,0.67,0, 66,0.56 and 0.43, and the single item with the maximum support degree (value of 0.82) is t1The subset of the 2_ subset and the 3_ subset whose support is the greatest (value 0.78) is (t)1∪t3) Then, a positive term set (t) is calculated using equation (14)1∪t2∪t3∪t4) The degree of correlation of (2) was 0.81.
4. Weighted association rule promotion
The traditional association rule evaluation framework (support-confidence) has the limitation of neglecting the item set support appearing in the rule back-piece, so that the rule with high confidence level may be misled sometimes. The degree of Lift (Lift) is an effective correlation metric to solve this problem. The association rule X → Y Lift (X → Y) refers to a ratio of a probability of simultaneously containing Y to a probability of occurrence of Y as a whole under a condition containing X, that is, a ratio of a Confidence of the rule (X → Y) to a support of the back part Y (sup (Y). Based on the traditional lifting degree concept, a weighted association rule I is given1→I2Elevation (All-weighted Association rule Lift, awARL) awARL (I)1→I2) The formula (2) is shown in formula (22).
According to the relevance theory, the promotion degree can evaluate the relevance of the front piece and the back piece of the association rule, and the degree of promotion (or reduction) of the appearance of one party to the appearance of the other party can be evaluated. I.e., when awARL (I)1→I2)>1 hour, I1→I2Is a positive association rule, item set I1And I2In (2), the occurrence of one increases the probability of the occurrence of the other; when awARL (I)1→I2)<1 hour, I1→I2A negative association rule, the occurrence of one party reduces the probability of the occurrence of the other party; when awARL (I)1→I2) When 1, item set I1And I2Are independent and unrelated, and the association rule I1→I2Is a dummy rule. It is easy to prove awARL (I)1→I2) Has the following properties 1.
Properties 1②awARL(﹁I1→I2)<1; ⑤awARL(﹁I1→I2)>1;⑥awARL(﹁I1→﹁I2)<1。
According to property 1, when awARL (I)1→I2)>1, a weighted positive association rule I can be mined1→I2. When awARL (I)1→I2)<1 hour, can dig out the weighted negative association rule I1→﹁I2And I1→I2。
Assuming that the minimum weighted confidence threshold is mc, in combination with property 1, a weighted strong positive-negative association rule is given as follows:
for weighted positive term set (I)1∪I2) If awARL (I)1→I2)>1, and awARConf (I)1→I2) If not less than mc, weighting association rule I1→I2Is a strongly associated rule.
For negative item set (I)1∪I2) If awARL (I)1→I2)<1, and awARConf (I)1→﹁I2)≥mc,awARConf(﹁I1→I2) Not less than mc, then I1→﹁I2And I1→I2Is a strong negative association rule.
The invention relates to a cross-language query expansion method based on weighted positive and negative rule front pieces and related feedback, which comprises the following steps:
1.1 translating a source language query into a target language query using a machine translation system;
the machine translation system may be: microsoft applied to the machine translation interface Microsoft Translator API, Google machine translation interface, and so on.
1.2 the target language queries and retrieves the original document set of the target language to obtain the initial documents of the target language, and the specific retrieval model is a classical retrieval model based on a vector space model.
1.3, constructing a target language initial examination related document set: the method comprises the steps that firstly, a user relevance judgment is carried out on a front-line n-space target language primary examination document to obtain a primary examination related document, and therefore a target language primary examination related document set is constructed;
1.4 mining a weighted frequent item set and a negative item set containing original query terms for a target language initial examination related document set;
the method comprises the following specific steps:
1.4.1, preprocessing a relevant document set of the initial inspection of the target language, and constructing a document index library and a total feature word library;
the pretreatment steps are as follows:
(1) for the target language is Chinese, performing Chinese word segmentation, removing stop words, extracting Chinese characteristic words, and adopting a Chinese lexical analysis system ICTCCLAS developed and compiled by the research institute of computational technology of Chinese academy of sciences to perform Chinese word segmentation; for the target language of English, a Porter program (see the website: http:// tartarus. org/. martin/Porter Stemmer in detail) is adopted to extract the stem of the word and remove the stop words of English;
(2) calculating the weight of the feature word, wherein the weight of the feature word indicates the importance degree of the feature word to the document where the feature word is located, and the invention adopts the classical and popular tf-idf feature word weight wijAnd (4) a calculation method. W isijThe calculation formula is shown in formula (23):
wherein, wijRepresenting a document diMiddle characteristic word tjWeight of (tf)j,iRepresentation feature word tjIn document diOf occurrence of (1), dfjMeaning containing a characteristic word tjN represents the total number of documents in the document set.
(3) And constructing a document index library and a total feature word library.
1.4.2 mining frequent 1_ item set L1: namely, obtaining the 1_ item set C of the characteristic word candidate from the total characteristic word library1Calculate 1_ item set C1Support of (a) awSup (C)1) If awSu (C)1) The support degree threshold value ms is more than or equal to, the candidate 1_ item set C1For frequent 1_ item set L1And mixing L1Adding to a weighted frequent item set PIS; the awSup (C)1) The calculation formula is shown in formula (24):
wherein n and W are the total length of the documents in the relevant document set of the initial examination of the target language and the sum of the weights of all the characteristic words respectively,is C1The frequency of occurrence in the target language initial examination related document set,is C1And (4) initially checking the weight value of the item set in the relevant document set in the target language, wherein β E (0,1) is an adjusting coefficient, and the value of the adjusting coefficient cannot be 0 or 1.
1.4.3 mining a weighted frequent k _ term set L containing query termskAnd negative k _ term set NkAnd k is more than or equal to 2.
The method comprises the following specific steps:
(1) mining candidate k _ term set Ck: by frequent (k-1) _ sets of items Lk-1Obtained by carrying out Aproiri ligation;
the Aproiri ligation method is described in the literature: agrawal R, Iminilinski T, Swami A. Miningassortment rules between sections of entities in large database [ C ]// Proceedings of the 1993ACM SIGMOD International Conference on Management of Data, Washington D C, USA,1993: 207-.
(2) When k is 2, prune the candidate 2_ term set C without query terms2Keeping candidate 2_ term set C containing query terms2。
(3) Computing a set of candidate k _ terms CkSupport of (a) awSup (C)k):
If awSu (C)k) Not less than the support threshold ms, and then C is calculatedkWeighted frequent item set relevance of (C) awPIRk) If awPIR (C)k) The relevance threshold value minPR of the frequent item set is more than or equal to, the k _ candidate item set CkAs a weighted frequent k _ term set LkAdding the weighted frequent item set PIS;
if awSu(Ck)<ms, then calculate a weighted negative term set relevance awNIR (C)k) If awNIR (C)k) A threshold value minNR for relevancy of a set of ≧ negative terms, then CkIs a weighted negative k _ term set NkAnd added to the set of weighted negative terms NIS. The awSup (C)k) The calculation formula is shown in formula (25):
wherein,is CkThe frequency of occurrence in the target language initial examination related document set,is CkItem set weight in target language initial check related document set, k is CkThe number of items.
awPIR(Ck) The calculation formula (c) is divided into two cases: m is 2 and m>2, i.e., as shown in formula (26) and formula (27),
wherein, the candidate weighted positive term set Ck=(t1,t2,…,tm),m≥2,tmax(1. ltoreq. max. ltoreq.m) is CkOf all items of (1) the single item whose support is the greatest, IqIs CkFrom all 2_ sub-item sets to (m-1) _ sub-item set with the most supported sub-item set.
awNIR(Ck) Is divided into twoThe following conditions are adopted: r is 2 and r>2, i.e., as shown in formula (28) and formula (29),
wherein the candidate weighted negative term set Ck=(t1,t2,…,tr),r≥2,tmax(1. ltoreq. max. ltoreq. r) is CkOf all items of (1) the single item whose support is the greatest, IpIs CkFrom all 2_ sub-item sets to the (r-1) _ sub-item set whose support is the greatest.
(4) If k _ entry set LkAnd (4) if the set is an empty set, ending the item set mining, and turning to the step (1.5), otherwise, turning to the step (1) and continuing the mining.
1.5 mining a weighted strong positive association rule from a weighted frequent item set PIS: weighting each frequent k _ item set L in the frequent item set PIS for the feature wordskK is more than or equal to 2, excavating LkThe front part is an association rule I → qt of an expansion term set I and the back part is a query term set qt, the union of qt and I is LkThe intersection of qt and I is an empty set, qt is a query term set, and I is an extended term set, and the specific mining steps are as follows:
(1) find the positive term set LkAll proper subsets of (A) to obtain LkA set of proper subset items;
(2) from LkArbitrarily take out two sub item sets qt and I in the proper subset set, anqt∪I=Lk,
(3) The weighted association rule I → qt confidence awardonf (I → qt) and its degree of lifting awARL (I → qt) are calculated. If awARL (I → qt) >1 and awARConf (I → qt) > is the minimum weighted confidence threshold mc, then the weighted strong association rule I → qt is obtained and added to the weighted strong positive association rule set PAR. The calculation formulas of awARConf (I → qt) and awARL (I → qt) are shown in formulas (30) and (31):
(4) returning to the step (2) and then sequentially carrying out until LkIf and only once for each proper subset in the proper subset item set, then re-fetch a new positive item set L from the PIS setkTurning to the step (1) to carry out a new round of weighted association rule mining step by step until each positive item set L in the PISkAll have been removed, at which point step 1.6 is performed.
1.6 mining weighted strong negative association rules from the negative set of terms NIS: for each negative set N in the negative set NISk,k>Dig 2, NkThe front part being the query term set qt and the back part being the weighted negative association rule I → qt with q and I → qt with the negative extension term set I, the sum of qt and I being NkAnd the intersection of qt and I is an empty set, and the concrete mining steps are as follows:
(1) find out negative item set NkAll proper subsets of (A) to obtain NkA set of proper subsets.
(2) From NkArbitrarily take out two sub item sets qt and I in the proper subset set, anqt∪I=Nk,
(3) Calculate the degree of lift awARL (I → qt), if awARL (I → qt) <1:
calculating a negative association rule I → qqt confidence level awARConf (I → qt), if awARConf (I → qt) > minimum weighted confidence threshold mc, resulting in a weighted strong negative association rule I → qt and adding to the set of weighted strong negative association rules NAR;
calculating negative association rule I → qt confidence awARconf (| → qt), if awARconf (| → qt) > < mc, then we get the weighted strong negative association rule I → qt, and add to NAR. The calculations of awaronf (I → qt) and awaronf (I → qt) are as shown for equations (32) and (33):
awARConf(I→﹁qt)=1-awARConf(I→qt) (32)
(4) returning to the step (2) and then sequentially executing until NkIf and only if each proper subset in the proper subset set is taken out once, then step (5) is carried out;
(5) re-fetching a new negative set of terms N from the NIS setkAnd (3) turning to the step (1) to carry out a new round of weighted negative association rule mining, if each negative item set in the NIS set is valid and is taken out only once, finishing the mining of the weighted strong negative association rule, and turning to the step 1.7.
1.7 extracting a weighted positive association rule mode I → qt of which the rule back part is the query term from the weighted strong positive association rule set PAR, and constructing a candidate front part expansion word bank by taking the characteristic word of the positive association rule front part as a candidate expansion word.
1.8 extracting the weighted negative association rule pattern I → qt and I → qt whose latter part of the rule is query term from the weighted strong negative association rule set NAR, and constructing the front part negative extended word bank by using the front part I of the negative association rule as the front part negative extended word.
1.9 comparing each candidate front part expansion word in the candidate front part expansion word library with the negative expansion word of the front part negative expansion word library, deleting the candidate expansion word same as the negative expansion word in the candidate front part expansion word library, wherein the rest candidate front part expansion words in the candidate front part expansion word library are the final front part expansion words.
2.0 the final combination of the front-piece expansion word and the target language original query word is searched again, and the front-piece expansion after cross-language query translation is realized.
Experimental design and results:
in order to illustrate the effectiveness of the method, Vietnamese and English are used as language objects to carry out a cross-English language information retrieval experiment based on the method and the comparison method.
Experimental data set:
english text data set of NTCIR-5CLIR is selected as the text experimental corpus. The corpus is a cross-language information retrieval standard data test corpus on a multi-national language processing international evaluation conference sponsored by the Japanese information research institute, and is derived from News texts of Mainichi Daily News News media 2000, 2001 (abbreviated as mdn00 and mdn01) and Korea Times2001 (abbreviated as ktn01), wherein 26224 pieces of English text information are provided in total (namely, there are 6608 pieces in mdn00, 5547 pieces in mdn01 and 14069 pieces in ktn01). The data set comprises a document test set, a result set and a query set, wherein the result set comprises two types, namely a Rigid standard (highly relevant and relevant to the query) and a Relay standard (highly relevant, relevant and partially relevant to the query), the query set comprises 50 query subjects, four versions of Japanese, Korean, Chinese and English are provided, and 4 types of query subjects, namely TITLE, DESC, NARR and CONC are provided, the TITLE query type describes the query subjects in brief by nouns and noun phrases and belongs to short query, and the DESC query type describes the query subjects in brief by sentences and belongs to long query. Search experiments were performed using TITLE and DESC query types.
In the experiment of the invention, because the NTCIR-5CLIR corpus does not provide Vietnamese query version, we specifically ask the professional translators of the Dong alliance language of the translation mechanism to manually translate 50 Chinese version query subject corpora in the NTCIR-5CLIR into Vietnamese query as the source language query of the text experiment.
The reference comparison method comprises the following steps:
(1) Cross-English Language search (Vietnamese-English Cross-Language Retrieval, VECLR) benchmark method: the method refers to a result of first retrieval in cross-English languages, namely a retrieval result obtained by retrieving English documents after Vietnamese query in the source language is translated into English by a machine, and a query expansion technology is not adopted in the retrieval process.
(2) The cross-English language retrieval method based on the Pseudo-relevant Feedback Query Post-Translation extension (Query Post-Translation Expansion base Pseudo-Translation Feedback, QPTE _ PRF) comprises the following steps: the QPTE _ PRF benchmark algorithm is a retrieval result which is expanded after translating English-crossing language query and cross-language query and is realized by a cross-language query expansion method based on pseudo-correlation cross-language query expansion [ J ] information academic newspaper 2010,29(2): 232-. The experimental method and parameters are as follows: the method comprises the steps of translating a Source language Vietnamese query machine into English query retrieval English documents, extracting 20 English documents of cross-language primary retrieval prostate English documents to construct a primary retrieval English related document set, extracting English feature terms and calculating weights of the English feature terms, and arranging the 20 feature terms in descending order according to the weights to realize cross-English and cross-language query translation and then expansion for English expansion words.
R-precision ratio (R-Prec) and P @5 are adopted as cross-language retrieval evaluation indexes of the invention. R-precision refers to the precision calculated when R documents are retrieved, wherein R refers to the number of relevant documents in the document set corresponding to a query, and the ranking of the documents in the document result set is not emphasized.
The experimental results are as follows:
the method comprises the steps of writing a source program of the method and the reference method, analyzing and comparing the over-English and over-language information retrieval performance of the method and the reference method through experiments, performing over-English and over-language information retrieval on 50 Vietnamese TITLE and DESC queries, performing user relevance judgment on 50 English documents in the front of cross-language initial examination to obtain relevant feedback documents of initial examination users (for simplicity, relevant documents in the front of initial examination 50 documents containing known result sets are regarded as relevant documents of the initial examination), performing experiments to obtain average values of R-Prec and P @5 of the over-English and over-language retrieval results, wherein the average values are respectively shown in tables 1 to 2, common experiment parameters are set as α ═ 0.3, minPR ═ 0.1, minNR ═ 0.01, and a 3_ item set is mined.
TABLE 1 search Performance comparison of the inventive method of the present invention with a comparative baseline method (TITLE query)
The experimental parameters of the table are mc is 0.8, ms is formed by {0.2,0.25,0.3,0.35,0.4,0.45} (mdn00), and ms is formed by {0.2,0.23,0.25,0.28,0.3} (mdn01 and ktn01).
The experimental results in table 1 show that, compared with the reference methods VECLR and QPTE _ PRF, the R-Prec and P @5 values of the TITLE query type cross-English language retrieval result of the method are greatly improved, the improvement amplitude of the method compared with the VECLR method can reach 91.28% to the maximum extent, and the improvement amplitude of the method compared with the QPTE _ PRF is 265.88% to the maximum extent.
TABLE 2 comparison of search Performance between the inventive method and the reference method (DESC query)
The experimental parameters of the table are mc is 0.8, ms belongs to {0.2,0.23,0.25,0.28,0.3}
From the experimental results in Table 2, the R-Prec and P @5 values of the DESC query type cross-English language retrieval result of the method of the invention are greatly improved compared with those of the reference methods VECLR and QPTE _ PRF, and the maximum improvement amplitudes are 137.38% and 238.75% respectively.
The experimental result shows that the method is effective and can actually improve the cross-language information retrieval performance.
Claims (1)
1. A cross-language query expansion method based on weighted positive and negative rule antecedents and related feedback is characterized by comprising the following steps:
1.1 translating a source language query into a target language query using a machine translation system;
1.2, the target language queries and retrieves the original document set of the target language to obtain a target language initial check document;
1.3, constructing a target language initial examination related document set: the method comprises the steps that firstly, a user relevance judgment is carried out on a front-line n-space target language primary examination document to obtain a primary examination related document, and therefore a target language primary examination related document set is constructed;
1.4 mining a weighted frequent item set and a negative item set containing original query terms for a target language initial examination related document set;
the method comprises the following specific steps:
1.4.1, preprocessing a relevant document set of the initial inspection of the target language, and constructing a document index library and a total feature word library;
1.4.2 mining frequent 1_ item set L1:
Namely, obtaining the 1_ item set C of the characteristic word candidate from the total characteristic word library1Calculate 1_ item set C1Support of (a) awSup (C)1) If awSu (C)1) The support degree threshold value ms is more than or equal to, the candidate 1_ item set C1For frequent 1_ item set L1And mixing L1Adding to a weighted frequent item set PIS; the awSup (C)1) The calculation formula is as follows:
wherein n and W are the total length of the documents in the relevant document set of the initial examination of the target language and the sum of the weights of all the characteristic words respectively,is C1The frequency of occurrence in the target language initial examination related document set,is C1The item set weight in the target language initial detection relevant document set, β ∈ (0,1) is an adjustment coefficient, and the value cannot be 0 or 1;
1.4.3 mining a frequent k _ term set L containing query termskAnd negative k _ term set NkK is more than or equal to 2
The method comprises the following specific steps:
(1) mining candidate k _ term set Ck: by frequent (k-1) _ sets of items Lk-1Obtained by carrying out Aproiri ligation;
(2) when k is 2, prune the candidate 2_ term set C without query terms2Keeping candidate 2_ term set C containing query terms2;
(3) Computing a set of candidate k _ terms CkSupport of (a) awSup (C)k):
If awSu (C)k) Not less than the support threshold ms, and then C is calculatedkWeighted frequent item set relevance of (C) awPIRk) If awPIR (C)k) The relevance threshold value minPR of the frequent item set is more than or equal to, the k _ candidate item set CkAs a weighted frequent k _ term set LkAdding the weighted frequent item set PIS;
if awSu (C)k)<ms, then calculate a weighted negative term set relevance awNIR (C)k) If awNIR (C)k) A threshold value minNR for relevancy of a set of ≧ negative terms, then CkIs a weighted negative k _ term set NkAnd added to the weighted negative set NIS; the awSup (C)k) The calculation formula is as follows:
wherein,is CkThe frequency of occurrence in the target language initial examination related document set,is CkItem set weight in target language initial check related document set, k is CkThe number of items of (2);
awPIR(Ck) The calculation formula (c) is divided into two cases: m is 2 and m>The number of 2 cases, that is,
wherein, the candidate weighted positive term set Ck=(t1,t2,…,tm),m≥2,tmax(1. ltoreq. max. ltoreq.m) is CkOf all items of (1) the single item whose support is the greatest, IqIs CkAll the 2_ sub-item sets to (m-1) _ sub-item set with the largest supporting degree;
awNIR(Ck) The calculation formula (c) is divided into two cases: r is 2 and r>The number of 2 cases, that is,
wherein the candidate weighted negative term set Ck=(t1,t2,…,tr),r≥2,tmax(1. ltoreq. max. ltoreq. r) is CkOf all items of (1) the single item whose support is the greatest, IpIs CkAll the 2_ sub-item sets to (r-1) _ sub-item set with the largest supporting degree;
(4) if k _ entry set LkIf the item set is an empty set, ending the mining of the item set, and turning to the step 1.5, otherwise, turning to the step (1) and continuing the mining;
1.5 mining a weighted strong positive association rule from a weighted frequent item set PIS: weighting each frequent k _ item set L in the frequent item set PIS for the feature wordskK is more than or equal to 2, excavating LkThe front part is an association rule I → qt of an expansion term set I and the back part is a query term set qt, the union of qt and I is LkThe intersection of qt and I is an empty set, qt is a query term set, and I is an extended term set, and the specific mining steps are as follows:
(1) find the positive term set LkAll proper subsets of (A) to obtain LkA set of proper subset items;
(2) from LkArbitrarily take out two sub item sets qt and I in the proper subset set, anqt∪I=Lk,
(3) Calculating a weighted association rule I → qt confidence level awARConf (I → qt) and a lifting degree awARL (I → qt) thereof; if awARL (I → qt) >1 and awARConf (I → qt) > is the minimum weighted confidence threshold mc, obtaining a weighted strong association rule I → qt and adding the weighted strong association rule I → qt to the weighted strong positive association rule set PAR; the calculation formulas of awARConf (I → qt) and awARL (I → qt) are as follows:
(4) returning to the step (2) and then sequentially carrying out until LkIf and only once for each proper subset in the proper subset item set, then re-fetch a new positive item set L from the PIS setkTurning to the step (1) to carry out a new round of weighted association rule mining until each positive item set L in the PISkAll are taken out, and then the step 1.6 is carried out;
1.6 mining weighted strong negative association rules from the negative set of terms NIS: for each negative set N in the negative set NISk,k>Dig 2, NkWeighted negative association rule with query term set qt in the first part and negative extension term set I in the second partAndthe union of qt and I is NkAnd the intersection of qt and I is an empty set, and the concrete mining steps are as follows:
(1) find out negative item set NkAll proper subsets of (A) to obtain NkA set of proper subsets;
(2) from NkArbitrarily take out two sub item sets qt and I in the proper subset set, anqt∪I=Nk,Wherein qt is the set of query terms;
(3) calculate the degree of lift awARL (I → qt), if awARL (I → qt) <1:
computing negative association rulesConfidence levelIf it isawARConf (I→﹁qt) >= minimum weighted confidence thresholdmc,Then a weighted strong negative association rule is obtainedAnd adding the NAR into a weighted strong negative association rule set;
computing negative association rulesConfidence levelIf it isThen a weighted strong negative association rule is obtainedAnd added to the NAR; saidAndthe calculation formula of (a) is as follows:
(4) returning to the step (2) and then sequentially executing until NkIf and only if each proper subset in the proper subset set is taken out once, then step (5) is carried out;
(5) re-fetching a new negative set of terms N from the NIS setkTurning to the step (1) to carry out a new round of weighted negative association rule mining, if each negative item set in the NIS set is valid and is taken out only once, finishing the mining of the weighted strong negative association rule, and turning to the step 1.7;
1.7, extracting a weighted positive association rule mode I → qt of which the rule back part is a query term from a weighted strong positive association rule set PAR, and constructing a candidate front part expansion word bank by taking the characteristic words of the front part of the positive association rule as candidate expansion words;
1.8 extracting weighted negative association rule mode with rule back part being query term from weighted strong negative association rule set NAR Andtaking the front part I of the negative association rule as a front part negative expansion word, and constructing a front part negative expansion word bank;
1.9 comparing each candidate front part expansion word in the candidate front part expansion word library with a negative expansion word of the front part negative expansion word library, deleting the candidate expansion words same as the negative expansion words in the candidate front part expansion word library, wherein the rest candidate front part expansion words in the candidate front part expansion word library are final front part expansion words;
2.0 the final combination of the front-piece expansion word and the target language original query word is searched again, and the front-piece expansion after cross-language query translation is realized.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710807540.4A CN107609095B (en) | 2017-09-08 | 2017-09-08 | Based on across the language inquiry extended method for weighting positive and negative regular former piece and relevant feedback |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710807540.4A CN107609095B (en) | 2017-09-08 | 2017-09-08 | Based on across the language inquiry extended method for weighting positive and negative regular former piece and relevant feedback |
Publications (2)
Publication Number | Publication Date |
---|---|
CN107609095A CN107609095A (en) | 2018-01-19 |
CN107609095B true CN107609095B (en) | 2019-07-09 |
Family
ID=61062737
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201710807540.4A Expired - Fee Related CN107609095B (en) | 2017-09-08 | 2017-09-08 | Based on across the language inquiry extended method for weighting positive and negative regular former piece and relevant feedback |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN107609095B (en) |
Families Citing this family (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109299278B (en) * | 2018-11-26 | 2022-02-15 | 广西财经学院 | Text retrieval method based on confidence coefficient-correlation coefficient framework mining rule antecedent |
CN109299292B (en) * | 2018-11-26 | 2022-02-15 | 广西财经学院 | Text retrieval method based on matrix weighted association rule front and back part mixed expansion |
CN109684464B (en) * | 2018-12-30 | 2021-06-04 | 广西财经学院 | Cross-language query expansion method for realizing rule back-part mining through weight comparison |
CN112925978A (en) * | 2021-02-26 | 2021-06-08 | 北京百度网讯科技有限公司 | Recommendation system evaluation method and device, electronic equipment and storage medium |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104216874A (en) * | 2014-09-22 | 2014-12-17 | 广西教育学院 | Chinese interword weighing positive and negative mode excavation method and system based on relevant coefficients |
CN105095512A (en) * | 2015-09-09 | 2015-11-25 | 四川省科技交流中心 | Cross-language private data retrieval system and method based on bridge language |
CN106557478A (en) * | 2015-09-25 | 2017-04-05 | 四川省科技交流中心 | Distributed across languages searching systems and its search method based on bridge language |
-
2017
- 2017-09-08 CN CN201710807540.4A patent/CN107609095B/en not_active Expired - Fee Related
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104216874A (en) * | 2014-09-22 | 2014-12-17 | 广西教育学院 | Chinese interword weighing positive and negative mode excavation method and system based on relevant coefficients |
CN105095512A (en) * | 2015-09-09 | 2015-11-25 | 四川省科技交流中心 | Cross-language private data retrieval system and method based on bridge language |
CN106557478A (en) * | 2015-09-25 | 2017-04-05 | 四川省科技交流中心 | Distributed across languages searching systems and its search method based on bridge language |
Non-Patent Citations (2)
Title |
---|
完全加权正负关联规则挖掘及其在教育数据中的应用;余如 等;《中文信息学报》;20141231;第28卷(第4期);全文 |
有效的矩阵加权正负关联规则挖掘算法——MWARM-SRCCCI;周秀梅 等;《计算机应用》;20141231;第34卷(第10期);全文 |
Also Published As
Publication number | Publication date |
---|---|
CN107609095A (en) | 2018-01-19 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN107609095B (en) | Based on across the language inquiry extended method for weighting positive and negative regular former piece and relevant feedback | |
WO2015196909A1 (en) | Word segmentation method and device | |
US20160041986A1 (en) | Smart Search Engine | |
CN107102983B (en) | Word vector representation method of Chinese concept based on network knowledge source | |
CN102662936A (en) | Chinese-English unknown words translating method blending Web excavation, multi-feature and supervised learning | |
CN109299278B (en) | Text retrieval method based on confidence coefficient-correlation coefficient framework mining rule antecedent | |
CN105760366A (en) | New word finding method aiming at specific field | |
US20170337179A1 (en) | Construction of a lexicon for a selected context | |
CN107526839B (en) | Consequent extended method is translated across language inquiry based on weight positive negative mode completely | |
CN106484781B (en) | Merge the Indonesia's Chinese cross-language retrieval method and system of association mode and user feedback | |
CN109684463B (en) | Cross-language post-translation and front-part extension method based on weight comparison and mining | |
CN109726263B (en) | Cross-language post-translation hybrid expansion method based on feature word weighted association pattern mining | |
CN109739953B (en) | Text retrieval method based on chi-square analysis-confidence framework and back-part expansion | |
Rasheed et al. | Query expansion in information retrieval for Urdu language | |
CN109299292B (en) | Text retrieval method based on matrix weighted association rule front and back part mixed expansion | |
CN109684465B (en) | Text retrieval method based on pattern mining and mixed expansion of item set weight value comparison | |
CN109739952A (en) | Merge the mode excavation of the degree of association and chi-square value and the cross-language retrieval method of extension | |
CN109684464B (en) | Cross-language query expansion method for realizing rule back-part mining through weight comparison | |
Alper | Auto-generating Bilingual Dictionaries: Results of the TIAD-2017 Shared Task Baseline Algorithm. | |
CN108170778B (en) | Chinese-English cross-language query post-translation expansion method based on fully weighted rule post-piece | |
CN107562904B (en) | Positive and negative association mode method for digging is weighted between fusion item weight and the English words of frequency | |
CN108416442B (en) | Chinese word matrix weighting association rule mining method based on item frequency and weight | |
Rahimi et al. | Creating a Wikipedia-based Persian-English word association dictionary | |
Tomás et al. | Mining wikipedia as a parallel and comparable corpus | |
Wloka | Identifying bilingual topics in Wikipedia for efficient parallel corpus extraction and building domain-specific glossaries for the Japanese-English language pair |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant | ||
CF01 | Termination of patent right due to non-payment of annual fee | ||
CF01 | Termination of patent right due to non-payment of annual fee |
Granted publication date: 20190709 Termination date: 20200908 |