CN107609095B - Based on across the language inquiry extended method for weighting positive and negative regular former piece and relevant feedback - Google Patents

Based on across the language inquiry extended method for weighting positive and negative regular former piece and relevant feedback Download PDF

Info

Publication number
CN107609095B
CN107609095B CN201710807540.4A CN201710807540A CN107609095B CN 107609095 B CN107609095 B CN 107609095B CN 201710807540 A CN201710807540 A CN 201710807540A CN 107609095 B CN107609095 B CN 107609095B
Authority
CN
China
Prior art keywords
negative
item
weighting
item collection
former piece
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Fee Related
Application number
CN201710807540.4A
Other languages
Chinese (zh)
Other versions
CN107609095A (en
Inventor
黄名选
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guangxi University of Finance and Economics
Original Assignee
Guangxi University of Finance and Economics
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guangxi University of Finance and Economics filed Critical Guangxi University of Finance and Economics
Priority to CN201710807540.4A priority Critical patent/CN107609095B/en
Publication of CN107609095A publication Critical patent/CN107609095A/en
Application granted granted Critical
Publication of CN107609095B publication Critical patent/CN107609095B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Machine Translation (AREA)

Abstract

A kind of across language inquiry extended method based on weighting positive and negative regular former piece and relevant feedback, source language query is first translated as object language using translation tool to inquire, target document is retrieved to obtain initial survey document, forefront initial survey document is extracted and constructs object language initial survey set of relevant documents after End-user relevance judges;Positive and negative association rule model, the positive and negative correlation rule library of construction feature word are weighted containing the Feature Words of inquiry lexical item to the excavation of initial survey set of relevant documents using towards the positive and negative association mode digging technology of weighting extended across language inquiry again;It is the positive and negative association rule model of weighting for inquiring lexical item that its consequent is extracted from rule base, using positive association rules former piece Feature Words as positive expansion word, negative customers rule former piece is removed in positive expansion word and obtains final former piece expansion word after negative expansion word and realize to translate rear former piece extension across language inquiry as negative expansion word.The present invention can improve and improve cross-language information retrieval performance, there is preferable application value and promotion prospect.

Description

Based on across the language inquiry extended method for weighting positive and negative regular former piece and relevant feedback
Technical field
It is specifically a kind of based on the positive and negative regular former piece of weighting and relevant feedback the invention belongs to internet information searching field Across language inquiry extended method, be suitable for cross-language information retrieval field.
Background technique
Cross-language information retrieval (Cross-Language Information Retrieval, CLIR) is referred to one kind The query formulation of language retrieves the technology of other language message resources, and the language of expression user query is known as original language (Source Language), language used in the document being retrieved is known as object language (Target Language).Across language inquiry extension Technology is a kind of one of core technology that can improve cross-language retrieval performance, aims to solve the problem that cross-language information retrieval is led The problems such as domain long-standing problem, serious inquiry topic drift and word mismatch.Across language inquiry extension occurs according to its extension In the different phase of retrieving, be divided into translate preceding query expansion, translate rear query expansion and aggregate query extension (i.e. while occurring Before translating with translate after query expansion) three kinds.With the rise that cross-language information retrieval is studied, across language inquiry extension is increasingly By the concern and discussion of domestic and foreign scholars, become a research hotspot.
Cross-language information retrieval is technology of the information retrieval in conjunction with machine translation, face more increasingly complex than single language retrieval The problem of facing is even more serious than single language retrieval.These problems are always the bottleneck for restricting cross-language information retrieval techniques development, Be also problem generally existing in current cross-language information retrieval urgently to be solved in the world, be mainly shown as: inquiry theme is tight Drift, word mismatch and query term translate ambiguity and ambiguity, etc. again.Across language inquiry extension solves the above problems One of core technology.In the past 10 years, across language inquiry extended model gets the attention and furthers investigate with algorithm, achieves Theoretical result abundant is fully solved the above problem but without final.
Summary of the invention
The present invention, which will weight positive and negative association mode and excavate, to be applied to extend after translating across language inquiry, is proposed based on weighting just Across the language inquiry extended method of negative rule former piece and relevant feedback, is applied to cross-language information retrieval field, can solve across language It says long-term existing inquiry topic drift and word mismatch problem in information retrieval, improves cross-language information retrieval performance, it can also To be applied to cross-language search engine, the retrieval performances such as recall ratio and the precision ratio of search engine are improved.
The technical solution adopted by the present invention is that:
1. a kind of based on across the language inquiry extended method for weighting positive and negative regular former piece and relevant feedback, which is characterized in that Include the following steps:
Source language query is translated as object language using machine translation system by 1.1 to be inquired;
1.2 object language query and search object language original document collection obtain object language initial survey document;
1.3 building object language initial survey set of relevant documents: it is related that forefront n object language initial survey document is subjected to user Property judge to obtain initial survey relevant documentation, thus construct object language initial survey set of relevant documents;
1.4 pairs of object language initial survey set of relevant documents excavate weighted frequent items and negative dependent containing former inquiry lexical item;
Specific steps:
1.4.1 object language initial survey set of relevant documents is pre-processed, constructs document index library and total characteristic dictionary;
1.4.2 Mining Frequent 1_ item collection L1:
Feature Words candidate's 1_ item collection C is obtained from total feature dictionary1, calculate 1_ item collection C1Support awSup (C1), If awSup (C1) >=support threshold ms, then candidate's 1_ item collection C1For frequent 1_ item collection L1, and by L1It is frequent to be added to weighting Item collection set PIS;AwSup (the C1) shown in calculation formula such as formula (1):
Wherein, n and W be respectively in object language initial survey set of relevant documents the total record of document and all Feature Words weights it is total With,For C1The frequency occurred is concentrated in object language initial survey relevant documentation,For C1In object language initial survey set of relevant documents In item centralized value, β ∈ (0,1) is adjustment factor, and value cannot take 0 or 1;
1.4.3 the frequent k_ item collection L containing inquiry lexical item is excavatedkWith negative k_ item collection Nk, k >=2
Specific steps:
(1) candidate's k_ item collection C is excavatedk: pass through frequent (k-1) _ item collection Lk-1It carries out Aproiri connection and obtains;
(2) as k=2, the candidate 2_ item collection C without inquiry lexical item is wiped out2, retain candidate 2_ containing inquiry lexical item Collect C2
(3) candidate's k_ item collection C is calculatedkSupport awSup (Ck):
If awSup (Ck) >=support threshold ms, then calculate CkWeighted frequent items degree of association awPIR (Ck), if awPIR(Ck) >=frequent item set degree of association threshold value minPR, then k_ candidate CkTo weight frequent k_ item collection Lk, it is added to weighting Frequent item set set PIS;
If awSup (Ck) < ms then calculates weighting negative dependent degree of association awNIR (Ck), if awNIR (Ck) >=negative dependent closes Connection degree threshold value minNR, then, CkTo weight negative k_ item collection Nk, and it is added to weighting negative dependent set NIS;The awSup (Ck) shown in calculation formula such as formula (2):
Wherein,For CkThe frequency occurred is concentrated in object language initial survey relevant documentation,For CkIn object language initial survey The item centralized value that relevant documentation is concentrated, k CkNumber of items;
awPIR(Ck) calculation formula in two kinds of situation: the situation of m=2 and m > 2, i.e., as shown in formula (3) and formula (4),
Wherein, candidate to weight positive item collection Ck=(t1,t2,…,tm), m >=2, tmax(1≤max≤m) is CkAll items In the maximum individual event mesh of its support, IqFor CkAll 2_ Son item sets into (m-1) _ Son item set the maximum subitem of its support Collection;
awNIR(Ck) calculation formula in two kinds of situation: the situation of r=2 and r > 2, i.e., as shown in formula (5) and formula (6),
Wherein, candidate weighting negative dependent Ck=(t1,t2,…,tr), r >=2, tmax(1≤max≤r) is CkAll items In the maximum individual event mesh of its support, IpFor CkAll 2_ Son item sets into (r-1) _ Son item set the maximum subitem of its support Collection;
(4) if k_ item collection LkFor empty set, then item set mining terminates, and goes to step 1.5, otherwise, goes to step (1), continues It excavates;
1.5 excavate the strong positive association rules of weighting from weighted frequent items set PIS: for Feature Words weighted frequent items The frequent k_ item collection L of each in set PISk, L is excavated in k >=2kMiddle former piece is expansion word item collection I and consequent is query word The union of the correlation rule I → qt, the qt and I of item collection qt are Lk, the intersection of qt and I are empty set, and qt is query word item collection, I is expansion word item collection, and specific excavation step is as follows:
(1) positive item collection L is found outkAll proper subclass, obtain LkProper subclass item collection set;
(2) from LkTwo Son item sets qt and I are arbitrarily taken out in proper subclass set, andQt ∪ I=Lk,Its In;
(3) weighted association rules I → qt confidence level awARConf (I → qt) and its promotion degree awARL (I → qt) are calculated; If awARL (I → qt) > 1, and awARConf (I → qt) >=minimum weight confidence threshold value mc, then obtain weighting strong association rule Then I → qt, and it is added to the strong positive association rules set PAR of weighting;The awARConf's (I → qt) and awARL (I → qt) Shown in calculation formula such as formula (7) and formula (8):
(4) sequence carries out return step (2) again, until LkEach proper subclass is and if only if being taken in proper subclass item collection set It is primary out, then new positive item collection L is retrieved from PIS setk, it is transferred to step (1) step and carries out new round weighted association rule It then excavates, until the positive item collection L of each in PISkUntil all having been taken out, it is at this moment transferred to step 1.6;
1.6 is regular from the strong negative customers of weighting are excavated in negative dependent set NIS: negative for each in negative dependent set NIS Item collection Nk, k >=2, excavate NkMiddle former piece is query word item collection qt and consequent is the weighting negative customers rule I of negative expansion word item collection I The union of → ﹁ qt and ﹁ I → qt, the qt and I are Nk, the intersection of qt and I are empty set, and specific excavation step is as follows:
(1) negative dependent N is found outkAll proper subclass, obtain NkProper subclass set;
(2) from NkTwo Son item sets qt and I are arbitrarily taken out in proper subclass set, andQt ∪ I=Nk,Its Middle qt is inquiry item collection;
(3) promotion degree awARL (I → qt) is calculated, if awARL (I → qt) < 1:
It calculates negative customers rule I → ﹁ qt confidence level awARConf (I → ﹁ qt), if awARConf (I → ﹁ qt) >=most Small weighting confidence threshold value mc then obtains weighting strong negative customers rule I → ﹁ qt, and is added to the strong negative customers regular collection of weighting NAR;
It calculates negative customers rule ﹁ I → qt confidence level awARConf (﹁ I → qt), if awARConf (﹁ I → qt) >= Mc is then obtained weighting strong negative customers rule ﹁ I → qt, and is added to NAR;The awARConf (I → ﹁ qt) and Shown in the calculation formula such as formula (9) and formula (10) of awARConf (﹁ I → qt):
AwARConf (I → ﹁ qt)=1-awARCong (I → qt) (9)
(4) sequence executes return step (2) again, until NkEach proper subclass is and if only if being removed one in proper subclass set Until secondary, it is at this moment transferred to step (5);
(5) new negative dependent N is retrieved from NIS setk, it is transferred to step (1) and carries out new round weighting negative customers rule It then excavates, if each negative dependent is primary and if only if having been taken out in NIS set, weights strong negative customers rule digging Terminate, is transferred to step 1.7;
1.7 extract the weighting positive association rule that its consequent is inquiry lexical item from the strong positive association rules set PAR of weighting Then mode I → qt constructs candidate former piece and extends dictionary using the positive association rules former piece Feature Words as candidate expansion word;
1.8 extract the weighting negative customers rule that its consequent is inquiry lexical item from the strong negative customers regular collection NAR of weighting Then mode I → ﹁ qt and ﹁ I → qt constructs the negative extension dictionary of former piece using negative customers rule former piece I as the negative expansion word of former piece;
The 1.9 negative expansion for each candidate former piece expansion word in candidate former piece extension dictionary, with the negative extension dictionary of former piece Exhibition word compares, and candidate expansion word identical with negative expansion word is deleted in candidate former piece extension dictionary, and candidate former piece extends dictionary In remaining candidate former piece expansion word be final former piece expansion word;
2.0 final former piece expansion words are that new inquiry is retrieved again with object language original inquiry word combination, and realization is looked into across language Rear former piece extension is translated in inquiry.
The above, weighting the symbols " ﹁ " such as strong negative customers rule I → ﹁ qt and ﹁ I → qt indicates negatively correlated symbol, " ﹁ I " indicates do not occur the case where Feature Words item collection I in object language initial survey relevant documentation concentration, that is, belongs to negatively correlated situation.
" I → ﹁ qt " indicates that negative correlativing relation is presented in expansion word item collection I and inquiry lexical item item collection qt, in object language initial survey Relevant documentation concentrates the appearance of expansion word item collection I so that inquiry lexical item item collection qt is not in.
" ﹁ I → qt " indicates that negative correlativing relation is presented in expansion word item collection I and inquiry lexical item item collection qt, in object language initial survey Relevant documentation concentration expansion word item collection I's does not occur so that inquiring lexical item item collection qt will appear.
Strong positive association rules I → qt is weighted to be meant that object language initial survey relevant documentation concentration expansion word item collection I's Appearance can promote inquiry lexical item item collection qt also to will appear.
Compared with the prior art, the present invention has the following beneficial effects:
(1) present invention proposes a kind of based on across the language inquiry extended method for weighting positive and negative regular former piece and relevant feedback. This method is used based on weighted support measure-degree of association-positive and negative mode excavation technology of promotion degree-confidence evaluation frame to across language It says that initial survey set of relevant documents is excavated and weights positive and negative association rule model, extract the former piece for weighting positive and negative association rule model as former The relevant former piece expansion word of inquiry lexical item, which is realized, translates rear former piece extension across language inquiry, so that cross-language information retrieval performance has It is preferable to be promoted.
(2) cross-language information in the multi-lingual processing world evaluation and test meeting that selection Japan Information information research is sponsored The English text data set of search criteria data test corpus NTCIR-5CLIR as present invention experiment corpus, with Vietnamese and English is language object, carries out the experiment of the method for the present invention.Experimental comparison's pedestal method is: not carrying out query expansion technology More English cross-language retrieval (Vietnamese-English Cross-Language Retrieval, VECLR) pedestal method and base In document (across language inquiry extension [J] the information journal of Wu Dan, He great Qing, Wang Huilin based on spurious correlation, 2010,29 (2): Pseudo-linear filter inquiry 232-239.) extends (Query Post-Translation Expansion Based on after translating Pseudo Relevance Feedback, QPTE_PRF) more English cross-language retrieval method.The experimental results showed that with comparison base Quasi- method VECLR and QPTE_PRF compares, the R- of the more English cross-language retrieval result of the TITLE query type of the method for the present invention Prec and 5 value of P are greatly improved, and the increase rate maximum than VECLR method can achieve 91.28%, compare QPTE_ The increase rate of PRF pedestal method has been up to 265.88%;The more English of the DESC query type of the method for the present invention is across language 5 value of R-Prec and P@of search result than pedestal method VECLR and QPTE_PRF there has also been biggish raising, maximum raisings Amplitude is respectively 137.38% and 238.75%.
(3) the experimental results showed that, the method for the present invention is effectively, to improve cross-language information retrieval performance, main cause Be analyzed as follows: cross-language information retrieval is frequently resulted in serious initial survey by the double influence of word mismatch and query translation quality The problems such as inquiring topic drift, the present invention, which will weight positive and negative association mode and excavate, is applied to more across the language inquiry extension of English, proposes A kind of to translate rear former piece extended method across language inquiry based on weight positive and negative association mode and user's relevant feedback, acquisition is looked into original It askes relevant former piece expansion word and realizes that translating rear former piece across language inquiry extends, can efficiently reduce long-term in cross-language information retrieval Existing inquiry topic drift and word mismatch problem improve and improve cross-language retrieval performance, have important application value With wide promotion prospect.
Detailed description of the invention
Fig. 1 is of the present invention based on across the language inquiry extended method frame for weighting positive and negative regular former piece and relevant feedback Figure.
Fig. 2 is of the present invention total based on across the language inquiry extended method for weighting positive and negative regular former piece and relevant feedback Body flow diagram.
Specific embodiment
Related notion of the present invention is described below by technical solution in order to better illustrate the present invention below:
1. translating rear former piece extension across language inquiry
It translates rear former piece extension across language inquiry to refer to: in extending across language inquiry, from object language initial survey relevant documentation After excavating obtained association rule model, association rule model former piece relevant to the inquiry of object language original is extracted as extension Word, expansion word are combined with object language original inquiry lexical item as new inquiry.
2. weighted support measure
Assuming that DS={ d1,d2,…,dnIt is across language target language initial survey set of relevant documents (Document Set, DS), Wherein, di(1≤i≤n) is i-th document in document sets DS, di={ t1,t2,…,tm,…,tp, tm(m=1,2 ..., p) For file characteristics lexical item mesh, abbreviation characteristic item is usually made of word, word or phrase, diIn corresponding Features weight set Wi ={ wi1,wi2,…,wim,…,wip, wimFor i-th document diIn m-th of characteristic item tmCorresponding weight, TS={ t1,t2,…, tkIndicate that all characteristic item set, each subset of TS are referred to as characteristic item item collection, abbreviation item collection in DS.
In view of the drawbacks of the prior art, the present invention has fully considered Feature Words project frequency and its weight, provides a kind of new Weighted support measure (All-weighted Support, awSup) awSup (I) calculation method.The awSup (I) calculates public Shown in formula such as formula (11).
Wherein, wIIt is weighting item collection I in across language target language initial survey set of relevant documents DS middle term centralized value summation, nIFor The matrix words frequency that weighting item collection I occurs in across language target language initial survey set of relevant documents DS, at the beginning of n is across language target language Examine total document record in set of relevant documents DS;W is all Feature Words power in across language target language initial survey set of relevant documents DS It is worth summation;K is the number of items (i.e. item collection length) of item collection I, and β ∈ (0,1) is adjustment factor, and value cannot take 0 or 1, mainly Effect is reconciling items frequency and the comprehensive influence to weighted support measure of project weight.
Assuming that minimum weight support threshold is ms, if awSup (I1∪I2) >=ms then weights item collection (I1∪I2) it is just Item collection (i.e. frequent item set), otherwise, (I1∪I2) it is negative dependent.
The method of the present invention only focuses on following three types weighting negative dependent: (﹁ I), (I1∪ ﹁ I2) and (﹁ I1∪I2), it provides and adds Weigh negative dependent support awSup (﹁ I), awSup (I1∪ ﹁ I2) and awSup (﹁ I1∪I2) calculation formula such as formula (12)-formula (14) shown in.
AwSup (I)=1-awSup (I) (12)
awSup(I1∪ ﹁ I2)=awSup (I1)-awSup(I1∪I2) (13)
AwSup (﹁ I1∪I2)=awSup (I2)-awSup(I1∪I2) (14)
The method of the present invention only focuses on following two classes weighting negative customers rule: (I1→ ﹁ I2) and (﹁ I1→I2), it weights positive and negative Correlation rule confidence level (All-weighted Association Rule Confidence, awARConf) awARConf (I1 →I2)、awARConf(I1→ ﹁ I2) and awARConf (﹁ I1→I2) calculation formula such as formula (15) to shown in formula (17).
3. weighting the positive negative dependent degree of association
Weighting the item collection degree of association refer to weighted term concentrate any two individual event mesh between and Son item set between strength of association Measurement.The item collection degree of association is higher, shows that relationship is closer between the Son item set in the item collection, more attracts attention.The present invention changes Into the existing degree of association, the calculation of relationship degree method for weighting positive negative dependent is given, had both considered any two individual event mesh in item collection Correlation degree, while having also contemplated in item collection existing relevance between two Son item sets.
Weight the positive item collection degree of association (All-weighted Positive Itemset Relevancy, awPIR): for The positive item collection C of weighted feature wordk=(t1,t2,…,tm), m is positive item collection CkLength, m >=2, if tmax(1≤max≤m) is Ck's The maximum individual event mesh of its support, I in all itemsqFor CkAll 2_ Son item sets into (m-1) _ Son item set, its support is most Big Son item set provides the positive item collection degree of association awPIR (C of weightingk) calculation formula such as formula (18) and formula (19) shown in.
Wherein, candidate to weight positive item collection Ck=(t1,t2,…,tm), m >=2, tmax(1≤max≤m) is CkAll items In the maximum individual event mesh of its support, IqFor CkAll 2_ Son item sets into (m-1) _ Son item set the maximum subitem of its support Collection.
Formula (18) and formula (19) show to weight positive item collection CkThe degree of association is equal to the maximum individual event mesh t of supportmaxAnd subitem Collect Iq(i.e. IqFor one of 2_ Son item set to (m-1) _ Son item set) summation of the positive item collection occurs when occurring respectively conditional probability.
It weights the negative dependent degree of association (All-weighted Negative Itemset Relevancy, awNIR): for Weighted feature word negative dependent Ck=(t1,t2,…,tr), r is negative dependent CkLength, r >=2, if tmax(1≤max≤r) is negative term Collect CkAll items in the maximum individual event mesh of its support, IpFor negative dependent CkAll 2_ Son item sets to (r-1) _ Son item set In the maximum Son item set of its support, provide weighting negative dependent degree of association awNIR (Ck) calculation formula such as formula (20) and formula (21) shown in.
Wherein, candidate weighting negative dependent Ck=(t1,t2,…,tr), r >=2, tmax(1≤max≤r) is CkAll items In the maximum individual event mesh of its support, IpFor CkAll 2_ Son item sets into (r-1) _ Son item set the maximum subitem of its support Collection.
Formula (20) and formula (21) show to weight negative dependent CkThe degree of association is equal to the maximum individual event mesh t of supportmaxAnd subitem Collect Ip(i.e. IpFor one of 2_ Son item set to (r-1) _ Son item set) negative dependent occurs when not occurring respectively conditional probability it is total With.
Example: if Ck=(t1∪t2∪t3∪t4) (support 0.65), individual event mesh t1, t2, t3And t4Support Respectively 0.82,0.45,0.76 and 0.75,2_ Son item set and 3_ Son item set (t1∪t2), (t1∪t3), (t1∪t4), (t2∪ t3), (t2∪t4), (t1∪t2∪t3), (t1∪t2∪t4), (t2∪t3∪t4) support is respectively 0.64,0.78,0.75, 0.74,0.67,0., 66,0.56,0.43, then the individual event mesh of its support maximum (value 0.82) is t1, 2_ Son item set and 3_ The Son item set of its support maximum (value is 0.78) is (t in Son item set1∪t3), then, positive item collection (t is calculated using formula (14)1 ∪t2∪t3∪t4) the degree of association be 0.81.
4. weighted association rules promotion degree
The limitation of traditional correlation rule evaluation frame (support-confidence level), which has ignored in consequent, to be occurred Item collection support, so that the rule of high confidence level there may come a time when to mislead.Promotion degree (Lift) is one for solving the problems, such as this Effective relativity measurement.Correlation rule X → Y promotion degree Lift (X → Y) refer to containing under conditions of X simultaneously containing the general of Y The ratio between the probability that rate and Y totally occur, i.e. the support sup (Y) of regular confidence level Confidence (X → Y) and consequent Y it Than.Based on traditional promotion degree concept, weighted association rules I is provided1→I2Promotion degree (All-weighted Association Rule Lift,awARL)awARL(I1→I2) calculation formula such as formula (22) shown in.
According to Correlation Theory, promotion degree can assess the correlation of correlation rule former piece and consequent, it can be estimated that a side Appearance promote the degree that (or reduce) another party occurs.That is, working as awARL (I1→I2When) > 1, I1→I2It is positive association rules, Item collection I1And I2In, the appearance of a side can promote a possibility that another party occurs;As awARL (I1→I2When) < 1, I1→I2It is then Negative customers rule, the appearance of a side can reduce a possibility that another party occurs;As awARL (I1→I2When)=1, item collection I1And I2 It is mutually indepedent, uncorrelated, correlation rule I at this time1→I2It is false rule.It can easily be proven that awARL (I1→I2) with as follows Property 1.
Property 12. awARL (﹁ I1→I2)<1; 5. awARL (﹁ I1→I2)>1;⑥awARL (﹁ I1→ ﹁ I2)<1。
According to property 1, as awARL (I1→I2When) > 1, weighting positive association rules I can be excavated1→I2.As awARL (I1 →I2When) < 1, weighting negative customers rule I can be excavated1→ ﹁ I2With ﹁ I1→I2
Assuming that minimum weight confidence threshold value is mc, binding property 1 provides the strong positive and negative correlation rule of weighting and is defined as follows:
For weighting positive item collection (I1∪I2), if awARL (I1→I2) > 1, and awARConf (I1→I2) >=mc, then weight Correlation rule I1→I2It is Strong association rule.
For negative dependent (I1∪I2), if awARL (I1→I2) < 1, and awARConf (I1→ ﹁ I2) >=mc, awARConf (﹁ I1→I2) >=mc, then I1→ ﹁ I2With ﹁ I1→I2It is strong negative customers rule.
A kind of across language inquiry extended method based on weighting positive and negative regular former piece and relevant feedback of the present invention, including it is as follows Step:
Source language query is translated as object language using machine translation system by 1.1 to be inquired;
The machine translation system may is that Microsoft must answer machine translation interface Microsoft Translator API, Google's machine translation interface, etc..
1.2 object language query and search object language original document collection obtain object language initial survey document, specifically used Retrieval model is the classical retrieval model based on vector space model.
1.3 building object language initial survey set of relevant documents: it is related that forefront n object language initial survey document is subjected to user Property judge to obtain initial survey relevant documentation, thus construct object language initial survey set of relevant documents;
1.4 pairs of object language initial survey set of relevant documents excavate weighted frequent items and negative dependent containing former inquiry lexical item;
Specific steps:
1.4.1 object language initial survey set of relevant documents is pre-processed, constructs document index library and total characteristic dictionary;
Pre-treatment step is:
(1) it is Chinese for object language, then carries out Chinese word segmentation, remove stop words, extracts Chinese Feature Words, Chinese point Word program develops the Chinese lexical analysis system ICTCLAS write using Inst. of Computing Techn. Academia Sinica;For target Language is English, then (sees network address in detail: http://tartarus.org/~martin/ using Porter program PorterStemmer stem extraction) is carried out, English stop words is removed;
(2) Feature Words weight is calculated, Feature Words weight shows the specific word for the significance level of document where it, this hair The bright tf-idf Feature Words weight ws with prevalence using classicsijCalculation method.The wijShown in calculation formula such as formula (23):
Wherein, wijIndicate document diMiddle Feature Words tjWeight, tfj,iIndicate Feature Words tjIn document diIn go out occurrence Number, dfjIt indicates to contain Feature Words tjNumber of documents, N indicates total number of documents in collection of document.
(3) document index library and total characteristic dictionary are constructed.
1.4.2 Mining Frequent 1_ item collection L1: Feature Words candidate's 1_ item collection C is obtained from total feature dictionary1, calculate 1_ Collect C1Support awSup (C1), if awSup (C1) >=support threshold ms, then candidate's 1_ item collection C1For frequent 1_ item collection L1, And by L1It is added to weighted frequent items set PIS;AwSup (the C1) shown in calculation formula such as formula (24):
Wherein, n and W be respectively in object language initial survey set of relevant documents the total record of document and all Feature Words weights it is total With,For C1The frequency occurred is concentrated in object language initial survey relevant documentation,For C1In object language initial survey set of relevant documents In item centralized value, β ∈ (0,1) is adjustment factor, and value cannot take 0 or 1.
1.4.3 the frequent k_ item collection L of weighting containing inquiry lexical item is excavatedkWith negative k_ item collection Nk, k >=2.
Specific steps:
(1) candidate's k_ item collection C is excavatedk: pass through frequent (k-1) _ item collection Lk-1It carries out Aproiri connection and obtains;
Aproiri connection method is detailed in document: Agrawal R, Imielinski T, Swami A.Mining association rules between sets of items in large database[C]//Proceedings of the 1993ACM SIGMOD International Conference on Management of Data,Washington D C,USA,1993:207-216。
(2) as k=2, the candidate 2_ item collection C without inquiry lexical item is wiped out2, retain candidate 2_ containing inquiry lexical item Collect C2
(3) candidate's k_ item collection C is calculatedkSupport awSup (Ck):
If awSup (Ck) >=support threshold ms, then calculate CkWeighted frequent items degree of association awPIR (Ck), if awPIR(Ck) >=frequent item set degree of association threshold value minPR, then k_ candidate CkTo weight frequent k_ item collection Lk, it is added to weighting Frequent item set set PIS;
If awSup (Ck) < ms then calculates weighting negative dependent degree of association awNIR (Ck), if awNIR (Ck) >=negative dependent closes Connection degree threshold value minNR, then, CkTo weight negative k_ item collection Nk, and it is added to weighting negative dependent set NIS.The awSup (Ck) shown in calculation formula such as formula (25):
Wherein,For CkThe frequency occurred is concentrated in object language initial survey relevant documentation,For CkIn object language initial survey The item centralized value that relevant documentation is concentrated, k CkNumber of items.
awPIR(Ck) calculation formula in two kinds of situation: the situation of m=2 and m > 2, i.e., as shown in formula (26) and formula (27),
Wherein, candidate to weight positive item collection Ck=(t1,t2,…,tm), m >=2, tmax(1≤max≤m) is CkAll items In the maximum individual event mesh of its support, IqFor CkAll 2_ Son item sets into (m-1) _ Son item set the maximum subitem of its support Collection.
awNIR(Ck) calculation formula in two kinds of situation: the situation of r=2 and r > 2, i.e., as shown in formula (28) and formula (29),
Wherein, candidate weighting negative dependent Ck=(t1,t2,…,tr), r >=2, tmax(1≤max≤r) is CkAll items In the maximum individual event mesh of its support, IpFor CkAll 2_ Son item sets into (r-1) _ Son item set the maximum subitem of its support Collection.
(4) if k_ item collection LkFor empty set, then item set mining terminates, and goes to step 1.5, otherwise, goes to step (1), continues It excavates.
1.5 excavate the strong positive association rules of weighting from weighted frequent items set PIS: for Feature Words weighted frequent items The frequent k_ item collection L of each in set PISk, L is excavated in k >=2kMiddle former piece is expansion word item collection I and consequent is query word The union of the correlation rule I → qt, the qt and I of item collection qt are Lk, the intersection of qt and I are empty set, and qt is query word item collection, I is expansion word item collection, and specific excavation step is as follows:
(1) positive item collection L is found outkAll proper subclass, obtain LkProper subclass item collection set;
(2) from LkTwo Son item sets qt and I are arbitrarily taken out in proper subclass set, andQt ∪ I=Lk,
(3) weighted association rules I → qt confidence level awARConf (I → qt) and its promotion degree awARL (I → qt) are calculated. If awARL (I → qt) > 1, and awARConf (I → qt) >=minimum weight confidence threshold value mc, then obtain weighting strong association rule Then I → qt, and it is added to the strong positive association rules set PAR of weighting.The awARConf's (I → qt) and awARL (I → qt) Shown in calculation formula such as formula (30) and formula (31):
(4) return step (2) step sequentially carries out again, until LkEach proper subclass is and if only if quilt in proper subclass item collection set It takes out once, then retrieves new positive item collection L from PIS setk, it is transferred to step (1) step and carries out new round weighted association Rule digging, until the positive item collection L of each in PISkUntil all having been taken out, it is at this moment transferred to step 1.6.
1.6 is regular from the strong negative customers of weighting are excavated in negative dependent set NIS: negative for each in negative dependent set NIS Item collection Nk, k >=2, excavate NkMiddle former piece is query word item collection qt and consequent is the weighting negative customers rule I of negative expansion word item collection I The union of → ﹁ qt and ﹁ I → qt, the qt and I are Nk, the intersection of qt and I are empty set, and specific excavation step is as follows:
(1) negative dependent N is found outkAll proper subclass, obtain NkProper subclass set.
(2) from NkTwo Son item sets qt and I are arbitrarily taken out in proper subclass set, andQt ∪ I=Nk,
(3) promotion degree awARL (I → qt) is calculated, if awARL (I → qt) < 1:
It calculates negative customers rule I → ﹁ qt confidence level awARConf (I → ﹁ qt), if awARConf (I → ﹁ qt) >=most Small weighting confidence threshold value mc then obtains weighting strong negative customers rule I → ﹁ qt, and is added to the strong negative customers regular collection of weighting NAR;
It calculates negative customers rule ﹁ I → qt confidence level awARConf (﹁ I → qt), if awARConf (﹁ I → qt) >= Mc is then obtained weighting strong negative customers rule ﹁ I → qt, and is added to NAR.The awARConf (I → ﹁ qt) and Shown in the calculation formula such as formula (32) and formula (33) of awARConf (﹁ I → qt):
AwARConf (I → ﹁ qt)=1-awARConf (I → qt) (32)
(4) sequence executes return step (2) again, until NkEach proper subclass is and if only if being removed one in proper subclass set Until secondary, it is at this moment transferred to step (5);
(5) new negative dependent N is retrieved from NIS setk, it is transferred to step (1) and carries out new round weighting negative customers rule It then excavates, if each negative dependent is primary and if only if having been taken out in NIS set, weights strong negative customers rule digging Terminate, is transferred to step 1.7.
1.7 extract the weighting positive association rule that its consequent is inquiry lexical item from the strong positive association rules set PAR of weighting Then mode I → qt constructs candidate former piece and extends dictionary using the positive association rules former piece Feature Words as candidate expansion word.
1.8 extract the weighting negative customers rule that its consequent is inquiry lexical item from the strong negative customers regular collection NAR of weighting Then mode I → ﹁ qt and ﹁ I → qt constructs the negative extension dictionary of former piece using negative customers rule former piece I as the negative expansion word of former piece.
The 1.9 negative expansion for each candidate former piece expansion word in candidate former piece extension dictionary, with the negative extension dictionary of former piece Exhibition word compares, and candidate expansion word identical with negative expansion word is deleted in candidate former piece extension dictionary, and candidate former piece extends dictionary In remaining candidate former piece expansion word be final former piece expansion word.
2.0 final former piece expansion words are that new inquiry is retrieved again with object language original inquiry word combination, and realization is looked into across language Rear former piece extension is translated in inquiry.
Experimental design and result:
In order to illustrate the validity of the method for the present invention, is carried out using Vietnamese and English as language object and be based on the method for the present invention With the more English cross-language information retrieval experiment of control methods.
Experimental data set:
Select the English text data set of NTCIR-5CLIR as testing corpus herein.The corpus is Japan Information information The cross-language information retrieval normal data testing material in the evaluation and test meeting of the multi-lingual processing world that research institute sponsors, derives from Mainichi Daily News news media 2000,2001 (abbreviation mdn00, mdn01) and Korea Times2001 (letter Claim ktn01) newsletter archive, (i.e. mdn00 has 6608 to totally 26224 English text information, and mdn01 has 5547, ktn01 There are 14069).The data set has wen chang qiao district collection, result set and query set, and result set has Rigid standard (i.e. highly with inquiry It is correlation, related) and two kinds of Relax standard (i.e. to inquire highly relevant, related and part related) etc., query set is looked into including 50 Theme is ask, there are 4 kinds of four versions such as Japanese, Korean, Chinese and English and TITLE, DESC, NARR and CONC etc. inquiries respectively Type of theme, TITLE query type briefly describes inquiry theme with noun and nominal phrase, belongs to short inquiry, and DESC is looked into It askes type and inquiry theme inquiry is briefly described with sentential form, belong to long inquiry.Herein using TITLE and DESC query type into Row retrieval experiment.
The present invention experiment in, due to NTCIR-5CLIR corpus do not provide Vietnamese inquiry version, still spy please translate 50 Chinese edition inquiry theme corpus human translations in NTCIR-5CLIR are Vietnamese by Association of South-east Asian Nations, mechanism language Professional translator Inquiry is as the source language query tested herein.
Compare pedestal method:
(1) English cross-language retrieval (Vietnamese-English Cross-Language Retrieval, VECLR) is got over Pedestal method: refer to more English across language retrieve for the first time as a result, i.e. by the inquiry of original language Vietnamese after machine translation is English The search result that retrieval English document obtains, without using query expansion technology in retrieving.
(2) (Query Post-Translation Expansion Based is extended after translating based on pseudo-linear filter inquiry On Pseudo Relevance Feedback, QPTE_PRF) more English cross-language retrieval method: QPTE_PRF benchmark algorithm is Based on document (across language inquiry extension [J] the information journal of Wu Dan, He great Qing, Wang Huilin based on spurious correlation, 2010,29 (2): 232-239. across language inquiry extended method) realizes the search result got over and extended after English is translated across language inquiry.It is tested Method and parameter: original language Vietnamese inquiry machine is translated as English query and search English document, extracts across language initial survey forefront 20 building initial survey English set of relevant documents of English document extract English feature lexical item and calculate its weight, arrange by weight descending Column more extend after English is translated across language inquiry using 20, forefront feature lexical item as English expansion word realization.
Using R- precision ratio (R-Prec) and P@5 as cross-language retrieval evaluation index of the invention.R- precision ratio refers to The precision ratio calculated after R document is retrieved, wherein R refers to corresponding to some inquiry relevant documentation in collection of document Number does not emphasize that document results concentrate the ordering scenario of document.
Experimental result is as follows:
The source program for having write the method for the present invention and pedestal method by experimental analysis and compares the method for the present invention and comparison The more English cross-language information retrieval performance of pedestal method carries out more across the language letter of English to 50 Vietnamese TITLE and DESC inquiries Breath retrieval obtains initial survey user relevant feedback text after carrying out End-user relevance judgement to 50, across language initial survey forefront English document The relevant documentation concentrated in the document of 50, initial survey forefront containing known results (for simplicity, herein in experiment, is considered as initial survey by shelves Relevant documentation) it is tested, the average value of the R-Prec and P@5 of more English cross-language retrieval result is obtained, respectively such as table 1 to table 2 Shown, public experiment parameter is provided that α=0.3, minPR=0.1, minNR=0.01, excavates and arrives 3_ item collection.
1 this paper inventive method of table is compared with the retrieval performance of comparison pedestal method (TITLE inquiry)
This table experiment parameter: mc=0.8, ms ∈ { 0.2,0.25,0.3,0.35,0.4,0.45 } (mdn00), ms ∈ { 0.2,0.23,0.25,0.28,0.3 } (mdn01 and ktn01)
Table 1 the experimental results showed that, compared with comparing pedestal method VECLR and QPTE_PRF pedestal method, the method for the present invention More 5 value of R-Prec and P of English cross-language retrieval result of TITLE query type be greatly improved, than the side VECLR The increase rate maximum of method can achieve 91.28%, and the increase rate than QPTE_PRF pedestal method is up to 265.88%.
2 this paper inventive method of table (DESC inquiry) compared with the retrieval performance of pedestal method
This table experiment parameter: mc=0.8, ms ∈ { 0.2,0.23,0.25,0.28,0.3 }
From 2 experimental result of table it is found that the R- of the more English cross-language retrieval result of the DESC query type of the method for the present invention Than pedestal method VECLR and QPTE_PRF there has also been biggish raising, maximum increase rate is respectively 5 value of Prec and P@ 137.38% and 238.75%.
The experimental results showed that the method for the present invention is effectively, to improve cross-language information retrieval performance really.

Claims (1)

1. a kind of based on across the language inquiry extended method for weighting positive and negative regular former piece and relevant feedback, which is characterized in that including Following steps:
Source language query is translated as object language using machine translation system by 1.1 to be inquired;
1.2 object language query and search object language original document collection obtain object language initial survey document;
1.3 building object language initial survey set of relevant documents: forefront n object language initial survey document is subjected to End-user relevance Judgement obtains initial survey relevant documentation, thus constructs object language initial survey set of relevant documents;
1.4 pairs of object language initial survey set of relevant documents excavate weighted frequent items and negative dependent containing former inquiry lexical item;
Specific steps:
1.4.1 object language initial survey set of relevant documents is pre-processed, constructs document index library and total characteristic dictionary;
1.4.2 Mining Frequent 1_ item collection L1:
Feature Words candidate's 1_ item collection C is obtained from total feature dictionary1, calculate 1_ item collection C1Support awSup (C1), if awSup(C1) >=support threshold ms, then candidate's 1_ item collection C1For frequent 1_ item collection L1, and by L1It is added to weighted frequent items Set PIS;AwSup (the C1) calculation formula is as follows:
Wherein, n and W is the summation of the total record of document and all Feature Words weights in difference object language initial survey set of relevant documents,For C1The frequency occurred is concentrated in object language initial survey relevant documentation,For C1It is concentrated in object language initial survey relevant documentation Item centralized value, β ∈ (0,1) is adjustment factor, and value cannot take 0 or 1;
1.4.3 the frequent k_ item collection L containing inquiry lexical item is excavatedkWith negative k_ item collection Nk, k >=2
Specific steps:
(1) candidate's k_ item collection C is excavatedk: pass through frequent (k-1) _ item collection Lk-1It carries out Aproiri connection and obtains;
(2) as k=2, the candidate 2_ item collection C without inquiry lexical item is wiped out2, retain the candidate 2_ item collection C containing inquiry lexical item2
(3) candidate's k_ item collection C is calculatedkSupport awSup (Ck):
If awSup (Ck) >=support threshold ms, then calculate CkWeighted frequent items degree of association awPIR (Ck), if awPIR (Ck) >=frequent item set degree of association threshold value minPR, then k_ candidate CkTo weight frequent k_ item collection Lk, it is frequent to be added to weighting Item collection set PIS;
If awSup (Ck) < ms then calculates weighting negative dependent degree of association awNIR (Ck), if awNIR (Ck) >=negative dependent the degree of association Threshold value minNR, then, CkTo weight negative k_ item collection Nk, and it is added to weighting negative dependent set NIS;AwSup (the Ck) meter It is as follows to calculate formula:
Wherein,For CkThe frequency occurred is concentrated in object language initial survey relevant documentation,For CkIt is related in object language initial survey Item centralized value in document sets, k CkNumber of items;
awPIR(Ck) calculation formula in two kinds of situation: the situation of m=2 and m > 2, that is,
Wherein, candidate to weight positive item collection Ck=(t1,t2,…,tm), m >=2, tmax(1≤max≤m) is CkAll items in its The maximum individual event mesh of support, IqFor CkAll 2_ Son item sets into (m-1) _ Son item set the maximum Son item set of its support;
awNIR(Ck) calculation formula in two kinds of situation: the situation of r=2 and r > 2, that is,
Wherein, candidate weighting negative dependent Ck=(t1,t2,…,tr), r >=2, tmax(1≤max≤r) is CkAll items in its The maximum individual event mesh of support, IpFor CkAll 2_ Son item sets into (r-1) _ Son item set the maximum Son item set of its support;
(4) if k_ item collection LkFor empty set, then item set mining terminates, and goes to step 1.5, otherwise, goes to step (1), continues to excavate;
1.5 excavate the strong positive association rules of weighting from weighted frequent items set PIS: for Feature Words weighted frequent items collection Close each frequent k_ item collection L in PISk, L is excavated in k >=2kMiddle former piece is expansion word item collection I and consequent is inquiry lexical item The union for collecting the correlation rule I → qt, the qt and I of qt is Lk, the intersection of qt and I are empty set, and qt is query word item collection, I For expansion word item collection, specific excavation step is as follows:
(1) positive item collection L is found outkAll proper subclass, obtain LkProper subclass item collection set;
(2) from LkTwo Son item sets qt and I are arbitrarily taken out in proper subclass set, andQt ∪ I=Lk,
(3) weighted association rules I → qt confidence level awARConf (I → qt) and its promotion degree awARL (I → qt) are calculated;If AwARL (I → qt) > 1, and awARConf (I → qt) >=minimum weight confidence threshold value mc then obtain weighting Strong association rule I → qt, and it is added to the strong positive association rules set PAR of weighting;The meter of the awARConf (I → qt) and awARL (I → qt) It is as follows to calculate formula:
(4) sequence carries out return step (2) again, until LkEach proper subclass is and if only if being removed one in proper subclass item collection set It is secondary, then new positive item collection L is retrieved from PIS setk, it is transferred to step (1) and carries out the excavation of new round weighted association rules, Until the positive item collection L of each in PISkUntil all having been taken out, it is at this moment transferred to step 1.6;
1.6 is regular from the strong negative customers of weighting are excavated in negative dependent set NIS: for each negative dependent in negative dependent set NIS Nk, k >=2, excavate NkMiddle former piece is query word item collection qt and consequent is the weighting negative customers rule of negative expansion word item collection IWithThe union of the qt and I are Nk, the intersection of qt and I are empty set, and specific excavation step is as follows:
(1) negative dependent N is found outkAll proper subclass, obtain NkProper subclass set;
(2) from NkTwo Son item sets qt and I are arbitrarily taken out in proper subclass set, andQt ∪ I=Nk,Wherein Qt is inquiry item collection;
(3) promotion degree awARL (I → qt) is calculated, if awARL (I → qt) < 1:
Calculate negative customers ruleConfidence levelIfawARConf (I→ ﹁qt) >=is minimum to be added Weigh confidence threshold valueMc,It then obtains weighting strong negative customers ruleAnd it is added to the strong negative customers regular collection NAR of weighting;
Calculate negative customers ruleConfidence levelIfThen It is regular to strong negative customers are weightedAnd it is added to NAR;DescribedWithCalculation formula it is as follows:
(4) sequence executes return step (2) again, until NkEach proper subclass is once and if only if being removed in proper subclass set Only, step (5) are at this moment transferred to;
(5) new negative dependent N is retrieved from NIS setk, it is transferred to step (1) and carries out new round weighting negative customers rule digging Pick weights strong negative customers rule digging knot if each negative dependent is primary and if only if having been taken out in NIS set Beam is transferred to step 1.7;
1.7 extract the weighting positive association rules mould that its consequent is inquiry lexical item from the strong positive association rules set PAR of weighting Formulas I → qt constructs candidate former piece and extends dictionary using the positive association rules former piece Feature Words as candidate expansion word;
1.8 extract the weighting negative customers rule mould that its consequent is inquiry lexical item from the strong negative customers regular collection NAR of weighting Formula WithUsing negative customers rule former piece I as the negative expansion word of former piece, the negative extension dictionary of former piece is constructed;
The 1.9 negative expansion word for each candidate former piece expansion word in candidate former piece extension dictionary, with the negative extension dictionary of former piece Compare, candidate expansion word identical with negative expansion word is deleted in candidate former piece extension dictionary, candidate former piece extends remaining in dictionary Under candidate former piece expansion word be final former piece expansion word;
2.0 final former piece expansion words are that new inquiry is retrieved again with object language original inquiry word combination, and realization is translated across language inquiry Former piece extends afterwards.
CN201710807540.4A 2017-09-08 2017-09-08 Based on across the language inquiry extended method for weighting positive and negative regular former piece and relevant feedback Expired - Fee Related CN107609095B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710807540.4A CN107609095B (en) 2017-09-08 2017-09-08 Based on across the language inquiry extended method for weighting positive and negative regular former piece and relevant feedback

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710807540.4A CN107609095B (en) 2017-09-08 2017-09-08 Based on across the language inquiry extended method for weighting positive and negative regular former piece and relevant feedback

Publications (2)

Publication Number Publication Date
CN107609095A CN107609095A (en) 2018-01-19
CN107609095B true CN107609095B (en) 2019-07-09

Family

ID=61062737

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710807540.4A Expired - Fee Related CN107609095B (en) 2017-09-08 2017-09-08 Based on across the language inquiry extended method for weighting positive and negative regular former piece and relevant feedback

Country Status (1)

Country Link
CN (1) CN107609095B (en)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109299278B (en) * 2018-11-26 2022-02-15 广西财经学院 Text retrieval method based on confidence coefficient-correlation coefficient framework mining rule antecedent
CN109299292B (en) * 2018-11-26 2022-02-15 广西财经学院 Text retrieval method based on matrix weighted association rule front and back part mixed expansion
CN109684464B (en) * 2018-12-30 2021-06-04 广西财经学院 Cross-language query expansion method for realizing rule back-part mining through weight comparison

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104216874A (en) * 2014-09-22 2014-12-17 广西教育学院 Chinese interword weighing positive and negative mode excavation method and system based on relevant coefficients
CN105095512A (en) * 2015-09-09 2015-11-25 四川省科技交流中心 Cross-language private data retrieval system and method based on bridge language
CN106557478A (en) * 2015-09-25 2017-04-05 四川省科技交流中心 Distributed across languages searching systems and its search method based on bridge language

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104216874A (en) * 2014-09-22 2014-12-17 广西教育学院 Chinese interword weighing positive and negative mode excavation method and system based on relevant coefficients
CN105095512A (en) * 2015-09-09 2015-11-25 四川省科技交流中心 Cross-language private data retrieval system and method based on bridge language
CN106557478A (en) * 2015-09-25 2017-04-05 四川省科技交流中心 Distributed across languages searching systems and its search method based on bridge language

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
完全加权正负关联规则挖掘及其在教育数据中的应用;余如 等;《中文信息学报》;20141231;第28卷(第4期);全文
有效的矩阵加权正负关联规则挖掘算法——MWARM-SRCCCI;周秀梅 等;《计算机应用》;20141231;第34卷(第10期);全文

Also Published As

Publication number Publication date
CN107609095A (en) 2018-01-19

Similar Documents

Publication Publication Date Title
CN107609095B (en) Based on across the language inquiry extended method for weighting positive and negative regular former piece and relevant feedback
CN106372241B (en) More across the language text search method of English and the system of word-based weighted association pattern
CN106484781B (en) Merge the Indonesia&#39;s Chinese cross-language retrieval method and system of association mode and user feedback
CN109299278A (en) Based on confidence level-related coefficient frame mining rule former piece text searching method
CN107526839B (en) Consequent extended method is translated across language inquiry based on weight positive negative mode completely
CN109684463B (en) Cross-language post-translation and front-part extension method based on weight comparison and mining
CN109726263B (en) Cross-language post-translation hybrid expansion method based on feature word weighted association pattern mining
CN109739953B (en) Text retrieval method based on chi-square analysis-confidence framework and back-part expansion
CN109739952A (en) Merge the mode excavation of the degree of association and chi-square value and the cross-language retrieval method of extension
CN107562904B (en) Positive and negative association mode method for digging is weighted between fusion item weight and the English words of frequency
CN109299292A (en) Text searching method based on the mixing extension of matrix weights correlation rule front and back pieces
CN109684464B (en) Cross-language query expansion method for realizing rule back-part mining through weight comparison
CN108170778B (en) Chinese-English cross-language query post-translation expansion method based on fully weighted rule post-piece
Wloka Identifying bilingual topics in wikipedia for efficient parallel corpus extraction and building domain-specific glossaries for the japanese-english language pair
CN109753559A (en) Across the language text search method with consequent extension is excavated based on RCSAC frame
Rao et al. Term weighting schemes for emerging event detection
CN106383883B (en) Indonesia&#39;s Chinese cross-language retrieval method and system based on matrix weights association mode
CN108133022B (en) Matrix weighting association rule-based Chinese-English cross-language query front piece expansion method
Zhang et al. Topic level disambiguation for weak queries
Li et al. Keyword extraction based on lexical chains and word co-occurrence for Chinese news web pages
Cagliero et al. Cross-lingual timeline summarization
CN109543196A (en) Indonesia-the English excavated based on weighting pattern translates rear former piece extended method across language
Caon et al. Finding synonyms and other semantically-similar terms from coselection data
Holzmann et al. Named entity evolution recognition on the Blogosphere
Yan et al. Terminology extraction in the field of water environment based on rules and statistics

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
CF01 Termination of patent right due to non-payment of annual fee
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20190709

Termination date: 20200908