CN107609095B - Based on across the language inquiry extended method for weighting positive and negative regular former piece and relevant feedback - Google Patents
Based on across the language inquiry extended method for weighting positive and negative regular former piece and relevant feedback Download PDFInfo
- Publication number
- CN107609095B CN107609095B CN201710807540.4A CN201710807540A CN107609095B CN 107609095 B CN107609095 B CN 107609095B CN 201710807540 A CN201710807540 A CN 201710807540A CN 107609095 B CN107609095 B CN 107609095B
- Authority
- CN
- China
- Prior art keywords
- negative
- item
- weighting
- item collection
- former piece
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Expired - Fee Related
Links
Landscapes
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
- Machine Translation (AREA)
Abstract
A kind of across language inquiry extended method based on weighting positive and negative regular former piece and relevant feedback, source language query is first translated as object language using translation tool to inquire, target document is retrieved to obtain initial survey document, forefront initial survey document is extracted and constructs object language initial survey set of relevant documents after End-user relevance judges;Positive and negative association rule model, the positive and negative correlation rule library of construction feature word are weighted containing the Feature Words of inquiry lexical item to the excavation of initial survey set of relevant documents using towards the positive and negative association mode digging technology of weighting extended across language inquiry again;It is the positive and negative association rule model of weighting for inquiring lexical item that its consequent is extracted from rule base, using positive association rules former piece Feature Words as positive expansion word, negative customers rule former piece is removed in positive expansion word and obtains final former piece expansion word after negative expansion word and realize to translate rear former piece extension across language inquiry as negative expansion word.The present invention can improve and improve cross-language information retrieval performance, there is preferable application value and promotion prospect.
Description
Technical field
It is specifically a kind of based on the positive and negative regular former piece of weighting and relevant feedback the invention belongs to internet information searching field
Across language inquiry extended method, be suitable for cross-language information retrieval field.
Background technique
Cross-language information retrieval (Cross-Language Information Retrieval, CLIR) is referred to one kind
The query formulation of language retrieves the technology of other language message resources, and the language of expression user query is known as original language (Source
Language), language used in the document being retrieved is known as object language (Target Language).Across language inquiry extension
Technology is a kind of one of core technology that can improve cross-language retrieval performance, aims to solve the problem that cross-language information retrieval is led
The problems such as domain long-standing problem, serious inquiry topic drift and word mismatch.Across language inquiry extension occurs according to its extension
In the different phase of retrieving, be divided into translate preceding query expansion, translate rear query expansion and aggregate query extension (i.e. while occurring
Before translating with translate after query expansion) three kinds.With the rise that cross-language information retrieval is studied, across language inquiry extension is increasingly
By the concern and discussion of domestic and foreign scholars, become a research hotspot.
Cross-language information retrieval is technology of the information retrieval in conjunction with machine translation, face more increasingly complex than single language retrieval
The problem of facing is even more serious than single language retrieval.These problems are always the bottleneck for restricting cross-language information retrieval techniques development,
Be also problem generally existing in current cross-language information retrieval urgently to be solved in the world, be mainly shown as: inquiry theme is tight
Drift, word mismatch and query term translate ambiguity and ambiguity, etc. again.Across language inquiry extension solves the above problems
One of core technology.In the past 10 years, across language inquiry extended model gets the attention and furthers investigate with algorithm, achieves
Theoretical result abundant is fully solved the above problem but without final.
Summary of the invention
The present invention, which will weight positive and negative association mode and excavate, to be applied to extend after translating across language inquiry, is proposed based on weighting just
Across the language inquiry extended method of negative rule former piece and relevant feedback, is applied to cross-language information retrieval field, can solve across language
It says long-term existing inquiry topic drift and word mismatch problem in information retrieval, improves cross-language information retrieval performance, it can also
To be applied to cross-language search engine, the retrieval performances such as recall ratio and the precision ratio of search engine are improved.
The technical solution adopted by the present invention is that:
1. a kind of based on across the language inquiry extended method for weighting positive and negative regular former piece and relevant feedback, which is characterized in that
Include the following steps:
Source language query is translated as object language using machine translation system by 1.1 to be inquired;
1.2 object language query and search object language original document collection obtain object language initial survey document;
1.3 building object language initial survey set of relevant documents: it is related that forefront n object language initial survey document is subjected to user
Property judge to obtain initial survey relevant documentation, thus construct object language initial survey set of relevant documents;
1.4 pairs of object language initial survey set of relevant documents excavate weighted frequent items and negative dependent containing former inquiry lexical item;
Specific steps:
1.4.1 object language initial survey set of relevant documents is pre-processed, constructs document index library and total characteristic dictionary;
1.4.2 Mining Frequent 1_ item collection L1:
Feature Words candidate's 1_ item collection C is obtained from total feature dictionary1, calculate 1_ item collection C1Support awSup (C1),
If awSup (C1) >=support threshold ms, then candidate's 1_ item collection C1For frequent 1_ item collection L1, and by L1It is frequent to be added to weighting
Item collection set PIS;AwSup (the C1) shown in calculation formula such as formula (1):
Wherein, n and W be respectively in object language initial survey set of relevant documents the total record of document and all Feature Words weights it is total
With,For C1The frequency occurred is concentrated in object language initial survey relevant documentation,For C1In object language initial survey set of relevant documents
In item centralized value, β ∈ (0,1) is adjustment factor, and value cannot take 0 or 1;
1.4.3 the frequent k_ item collection L containing inquiry lexical item is excavatedkWith negative k_ item collection Nk, k >=2
Specific steps:
(1) candidate's k_ item collection C is excavatedk: pass through frequent (k-1) _ item collection Lk-1It carries out Aproiri connection and obtains;
(2) as k=2, the candidate 2_ item collection C without inquiry lexical item is wiped out2, retain candidate 2_ containing inquiry lexical item
Collect C2;
(3) candidate's k_ item collection C is calculatedkSupport awSup (Ck):
If awSup (Ck) >=support threshold ms, then calculate CkWeighted frequent items degree of association awPIR (Ck), if
awPIR(Ck) >=frequent item set degree of association threshold value minPR, then k_ candidate CkTo weight frequent k_ item collection Lk, it is added to weighting
Frequent item set set PIS;
If awSup (Ck) < ms then calculates weighting negative dependent degree of association awNIR (Ck), if awNIR (Ck) >=negative dependent closes
Connection degree threshold value minNR, then, CkTo weight negative k_ item collection Nk, and it is added to weighting negative dependent set NIS;The awSup
(Ck) shown in calculation formula such as formula (2):
Wherein,For CkThe frequency occurred is concentrated in object language initial survey relevant documentation,For CkIn object language initial survey
The item centralized value that relevant documentation is concentrated, k CkNumber of items;
awPIR(Ck) calculation formula in two kinds of situation: the situation of m=2 and m > 2, i.e., as shown in formula (3) and formula (4),
Wherein, candidate to weight positive item collection Ck=(t1,t2,…,tm), m >=2, tmax(1≤max≤m) is CkAll items
In the maximum individual event mesh of its support, IqFor CkAll 2_ Son item sets into (m-1) _ Son item set the maximum subitem of its support
Collection;
awNIR(Ck) calculation formula in two kinds of situation: the situation of r=2 and r > 2, i.e., as shown in formula (5) and formula (6),
Wherein, candidate weighting negative dependent Ck=(t1,t2,…,tr), r >=2, tmax(1≤max≤r) is CkAll items
In the maximum individual event mesh of its support, IpFor CkAll 2_ Son item sets into (r-1) _ Son item set the maximum subitem of its support
Collection;
(4) if k_ item collection LkFor empty set, then item set mining terminates, and goes to step 1.5, otherwise, goes to step (1), continues
It excavates;
1.5 excavate the strong positive association rules of weighting from weighted frequent items set PIS: for Feature Words weighted frequent items
The frequent k_ item collection L of each in set PISk, L is excavated in k >=2kMiddle former piece is expansion word item collection I and consequent is query word
The union of the correlation rule I → qt, the qt and I of item collection qt are Lk, the intersection of qt and I are empty set, and qt is query word item collection,
I is expansion word item collection, and specific excavation step is as follows:
(1) positive item collection L is found outkAll proper subclass, obtain LkProper subclass item collection set;
(2) from LkTwo Son item sets qt and I are arbitrarily taken out in proper subclass set, andQt ∪ I=Lk,Its
In;
(3) weighted association rules I → qt confidence level awARConf (I → qt) and its promotion degree awARL (I → qt) are calculated;
If awARL (I → qt) > 1, and awARConf (I → qt) >=minimum weight confidence threshold value mc, then obtain weighting strong association rule
Then I → qt, and it is added to the strong positive association rules set PAR of weighting;The awARConf's (I → qt) and awARL (I → qt)
Shown in calculation formula such as formula (7) and formula (8):
(4) sequence carries out return step (2) again, until LkEach proper subclass is and if only if being taken in proper subclass item collection set
It is primary out, then new positive item collection L is retrieved from PIS setk, it is transferred to step (1) step and carries out new round weighted association rule
It then excavates, until the positive item collection L of each in PISkUntil all having been taken out, it is at this moment transferred to step 1.6;
1.6 is regular from the strong negative customers of weighting are excavated in negative dependent set NIS: negative for each in negative dependent set NIS
Item collection Nk, k >=2, excavate NkMiddle former piece is query word item collection qt and consequent is the weighting negative customers rule I of negative expansion word item collection I
The union of → ﹁ qt and ﹁ I → qt, the qt and I are Nk, the intersection of qt and I are empty set, and specific excavation step is as follows:
(1) negative dependent N is found outkAll proper subclass, obtain NkProper subclass set;
(2) from NkTwo Son item sets qt and I are arbitrarily taken out in proper subclass set, andQt ∪ I=Nk,Its
Middle qt is inquiry item collection;
(3) promotion degree awARL (I → qt) is calculated, if awARL (I → qt) < 1:
It calculates negative customers rule I → ﹁ qt confidence level awARConf (I → ﹁ qt), if awARConf (I → ﹁ qt) >=most
Small weighting confidence threshold value mc then obtains weighting strong negative customers rule I → ﹁ qt, and is added to the strong negative customers regular collection of weighting
NAR;
It calculates negative customers rule ﹁ I → qt confidence level awARConf (﹁ I → qt), if awARConf (﹁ I → qt) >=
Mc is then obtained weighting strong negative customers rule ﹁ I → qt, and is added to NAR;The awARConf (I → ﹁ qt) and
Shown in the calculation formula such as formula (9) and formula (10) of awARConf (﹁ I → qt):
AwARConf (I → ﹁ qt)=1-awARCong (I → qt) (9)
(4) sequence executes return step (2) again, until NkEach proper subclass is and if only if being removed one in proper subclass set
Until secondary, it is at this moment transferred to step (5);
(5) new negative dependent N is retrieved from NIS setk, it is transferred to step (1) and carries out new round weighting negative customers rule
It then excavates, if each negative dependent is primary and if only if having been taken out in NIS set, weights strong negative customers rule digging
Terminate, is transferred to step 1.7;
1.7 extract the weighting positive association rule that its consequent is inquiry lexical item from the strong positive association rules set PAR of weighting
Then mode I → qt constructs candidate former piece and extends dictionary using the positive association rules former piece Feature Words as candidate expansion word;
1.8 extract the weighting negative customers rule that its consequent is inquiry lexical item from the strong negative customers regular collection NAR of weighting
Then mode I → ﹁ qt and ﹁ I → qt constructs the negative extension dictionary of former piece using negative customers rule former piece I as the negative expansion word of former piece;
The 1.9 negative expansion for each candidate former piece expansion word in candidate former piece extension dictionary, with the negative extension dictionary of former piece
Exhibition word compares, and candidate expansion word identical with negative expansion word is deleted in candidate former piece extension dictionary, and candidate former piece extends dictionary
In remaining candidate former piece expansion word be final former piece expansion word;
2.0 final former piece expansion words are that new inquiry is retrieved again with object language original inquiry word combination, and realization is looked into across language
Rear former piece extension is translated in inquiry.
The above, weighting the symbols " ﹁ " such as strong negative customers rule I → ﹁ qt and ﹁ I → qt indicates negatively correlated symbol, " ﹁
I " indicates do not occur the case where Feature Words item collection I in object language initial survey relevant documentation concentration, that is, belongs to negatively correlated situation.
" I → ﹁ qt " indicates that negative correlativing relation is presented in expansion word item collection I and inquiry lexical item item collection qt, in object language initial survey
Relevant documentation concentrates the appearance of expansion word item collection I so that inquiry lexical item item collection qt is not in.
" ﹁ I → qt " indicates that negative correlativing relation is presented in expansion word item collection I and inquiry lexical item item collection qt, in object language initial survey
Relevant documentation concentration expansion word item collection I's does not occur so that inquiring lexical item item collection qt will appear.
Strong positive association rules I → qt is weighted to be meant that object language initial survey relevant documentation concentration expansion word item collection I's
Appearance can promote inquiry lexical item item collection qt also to will appear.
Compared with the prior art, the present invention has the following beneficial effects:
(1) present invention proposes a kind of based on across the language inquiry extended method for weighting positive and negative regular former piece and relevant feedback.
This method is used based on weighted support measure-degree of association-positive and negative mode excavation technology of promotion degree-confidence evaluation frame to across language
It says that initial survey set of relevant documents is excavated and weights positive and negative association rule model, extract the former piece for weighting positive and negative association rule model as former
The relevant former piece expansion word of inquiry lexical item, which is realized, translates rear former piece extension across language inquiry, so that cross-language information retrieval performance has
It is preferable to be promoted.
(2) cross-language information in the multi-lingual processing world evaluation and test meeting that selection Japan Information information research is sponsored
The English text data set of search criteria data test corpus NTCIR-5CLIR as present invention experiment corpus, with Vietnamese and
English is language object, carries out the experiment of the method for the present invention.Experimental comparison's pedestal method is: not carrying out query expansion technology
More English cross-language retrieval (Vietnamese-English Cross-Language Retrieval, VECLR) pedestal method and base
In document (across language inquiry extension [J] the information journal of Wu Dan, He great Qing, Wang Huilin based on spurious correlation, 2010,29 (2):
Pseudo-linear filter inquiry 232-239.) extends (Query Post-Translation Expansion Based on after translating
Pseudo Relevance Feedback, QPTE_PRF) more English cross-language retrieval method.The experimental results showed that with comparison base
Quasi- method VECLR and QPTE_PRF compares, the R- of the more English cross-language retrieval result of the TITLE query type of the method for the present invention
Prec and 5 value of P are greatly improved, and the increase rate maximum than VECLR method can achieve 91.28%, compare QPTE_
The increase rate of PRF pedestal method has been up to 265.88%;The more English of the DESC query type of the method for the present invention is across language
5 value of R-Prec and P@of search result than pedestal method VECLR and QPTE_PRF there has also been biggish raising, maximum raisings
Amplitude is respectively 137.38% and 238.75%.
(3) the experimental results showed that, the method for the present invention is effectively, to improve cross-language information retrieval performance, main cause
Be analyzed as follows: cross-language information retrieval is frequently resulted in serious initial survey by the double influence of word mismatch and query translation quality
The problems such as inquiring topic drift, the present invention, which will weight positive and negative association mode and excavate, is applied to more across the language inquiry extension of English, proposes
A kind of to translate rear former piece extended method across language inquiry based on weight positive and negative association mode and user's relevant feedback, acquisition is looked into original
It askes relevant former piece expansion word and realizes that translating rear former piece across language inquiry extends, can efficiently reduce long-term in cross-language information retrieval
Existing inquiry topic drift and word mismatch problem improve and improve cross-language retrieval performance, have important application value
With wide promotion prospect.
Detailed description of the invention
Fig. 1 is of the present invention based on across the language inquiry extended method frame for weighting positive and negative regular former piece and relevant feedback
Figure.
Fig. 2 is of the present invention total based on across the language inquiry extended method for weighting positive and negative regular former piece and relevant feedback
Body flow diagram.
Specific embodiment
Related notion of the present invention is described below by technical solution in order to better illustrate the present invention below:
1. translating rear former piece extension across language inquiry
It translates rear former piece extension across language inquiry to refer to: in extending across language inquiry, from object language initial survey relevant documentation
After excavating obtained association rule model, association rule model former piece relevant to the inquiry of object language original is extracted as extension
Word, expansion word are combined with object language original inquiry lexical item as new inquiry.
2. weighted support measure
Assuming that DS={ d1,d2,…,dnIt is across language target language initial survey set of relevant documents (Document Set, DS),
Wherein, di(1≤i≤n) is i-th document in document sets DS, di={ t1,t2,…,tm,…,tp, tm(m=1,2 ..., p)
For file characteristics lexical item mesh, abbreviation characteristic item is usually made of word, word or phrase, diIn corresponding Features weight set Wi
={ wi1,wi2,…,wim,…,wip, wimFor i-th document diIn m-th of characteristic item tmCorresponding weight, TS={ t1,t2,…,
tkIndicate that all characteristic item set, each subset of TS are referred to as characteristic item item collection, abbreviation item collection in DS.
In view of the drawbacks of the prior art, the present invention has fully considered Feature Words project frequency and its weight, provides a kind of new
Weighted support measure (All-weighted Support, awSup) awSup (I) calculation method.The awSup (I) calculates public
Shown in formula such as formula (11).
Wherein, wIIt is weighting item collection I in across language target language initial survey set of relevant documents DS middle term centralized value summation, nIFor
The matrix words frequency that weighting item collection I occurs in across language target language initial survey set of relevant documents DS, at the beginning of n is across language target language
Examine total document record in set of relevant documents DS;W is all Feature Words power in across language target language initial survey set of relevant documents DS
It is worth summation;K is the number of items (i.e. item collection length) of item collection I, and β ∈ (0,1) is adjustment factor, and value cannot take 0 or 1, mainly
Effect is reconciling items frequency and the comprehensive influence to weighted support measure of project weight.
Assuming that minimum weight support threshold is ms, if awSup (I1∪I2) >=ms then weights item collection (I1∪I2) it is just
Item collection (i.e. frequent item set), otherwise, (I1∪I2) it is negative dependent.
The method of the present invention only focuses on following three types weighting negative dependent: (﹁ I), (I1∪ ﹁ I2) and (﹁ I1∪I2), it provides and adds
Weigh negative dependent support awSup (﹁ I), awSup (I1∪ ﹁ I2) and awSup (﹁ I1∪I2) calculation formula such as formula (12)-formula
(14) shown in.
AwSup (I)=1-awSup (I) (12)
awSup(I1∪ ﹁ I2)=awSup (I1)-awSup(I1∪I2) (13)
AwSup (﹁ I1∪I2)=awSup (I2)-awSup(I1∪I2) (14)
The method of the present invention only focuses on following two classes weighting negative customers rule: (I1→ ﹁ I2) and (﹁ I1→I2), it weights positive and negative
Correlation rule confidence level (All-weighted Association Rule Confidence, awARConf) awARConf (I1
→I2)、awARConf(I1→ ﹁ I2) and awARConf (﹁ I1→I2) calculation formula such as formula (15) to shown in formula (17).
3. weighting the positive negative dependent degree of association
Weighting the item collection degree of association refer to weighted term concentrate any two individual event mesh between and Son item set between strength of association
Measurement.The item collection degree of association is higher, shows that relationship is closer between the Son item set in the item collection, more attracts attention.The present invention changes
Into the existing degree of association, the calculation of relationship degree method for weighting positive negative dependent is given, had both considered any two individual event mesh in item collection
Correlation degree, while having also contemplated in item collection existing relevance between two Son item sets.
Weight the positive item collection degree of association (All-weighted Positive Itemset Relevancy, awPIR): for
The positive item collection C of weighted feature wordk=(t1,t2,…,tm), m is positive item collection CkLength, m >=2, if tmax(1≤max≤m) is Ck's
The maximum individual event mesh of its support, I in all itemsqFor CkAll 2_ Son item sets into (m-1) _ Son item set, its support is most
Big Son item set provides the positive item collection degree of association awPIR (C of weightingk) calculation formula such as formula (18) and formula (19) shown in.
Wherein, candidate to weight positive item collection Ck=(t1,t2,…,tm), m >=2, tmax(1≤max≤m) is CkAll items
In the maximum individual event mesh of its support, IqFor CkAll 2_ Son item sets into (m-1) _ Son item set the maximum subitem of its support
Collection.
Formula (18) and formula (19) show to weight positive item collection CkThe degree of association is equal to the maximum individual event mesh t of supportmaxAnd subitem
Collect Iq(i.e. IqFor one of 2_ Son item set to (m-1) _ Son item set) summation of the positive item collection occurs when occurring respectively conditional probability.
It weights the negative dependent degree of association (All-weighted Negative Itemset Relevancy, awNIR): for
Weighted feature word negative dependent Ck=(t1,t2,…,tr), r is negative dependent CkLength, r >=2, if tmax(1≤max≤r) is negative term
Collect CkAll items in the maximum individual event mesh of its support, IpFor negative dependent CkAll 2_ Son item sets to (r-1) _ Son item set
In the maximum Son item set of its support, provide weighting negative dependent degree of association awNIR (Ck) calculation formula such as formula (20) and formula
(21) shown in.
Wherein, candidate weighting negative dependent Ck=(t1,t2,…,tr), r >=2, tmax(1≤max≤r) is CkAll items
In the maximum individual event mesh of its support, IpFor CkAll 2_ Son item sets into (r-1) _ Son item set the maximum subitem of its support
Collection.
Formula (20) and formula (21) show to weight negative dependent CkThe degree of association is equal to the maximum individual event mesh t of supportmaxAnd subitem
Collect Ip(i.e. IpFor one of 2_ Son item set to (r-1) _ Son item set) negative dependent occurs when not occurring respectively conditional probability it is total
With.
Example: if Ck=(t1∪t2∪t3∪t4) (support 0.65), individual event mesh t1, t2, t3And t4Support
Respectively 0.82,0.45,0.76 and 0.75,2_ Son item set and 3_ Son item set (t1∪t2), (t1∪t3), (t1∪t4), (t2∪
t3), (t2∪t4), (t1∪t2∪t3), (t1∪t2∪t4), (t2∪t3∪t4) support is respectively 0.64,0.78,0.75,
0.74,0.67,0., 66,0.56,0.43, then the individual event mesh of its support maximum (value 0.82) is t1, 2_ Son item set and 3_
The Son item set of its support maximum (value is 0.78) is (t in Son item set1∪t3), then, positive item collection (t is calculated using formula (14)1
∪t2∪t3∪t4) the degree of association be 0.81.
4. weighted association rules promotion degree
The limitation of traditional correlation rule evaluation frame (support-confidence level), which has ignored in consequent, to be occurred
Item collection support, so that the rule of high confidence level there may come a time when to mislead.Promotion degree (Lift) is one for solving the problems, such as this
Effective relativity measurement.Correlation rule X → Y promotion degree Lift (X → Y) refer to containing under conditions of X simultaneously containing the general of Y
The ratio between the probability that rate and Y totally occur, i.e. the support sup (Y) of regular confidence level Confidence (X → Y) and consequent Y it
Than.Based on traditional promotion degree concept, weighted association rules I is provided1→I2Promotion degree (All-weighted Association
Rule Lift,awARL)awARL(I1→I2) calculation formula such as formula (22) shown in.
According to Correlation Theory, promotion degree can assess the correlation of correlation rule former piece and consequent, it can be estimated that a side
Appearance promote the degree that (or reduce) another party occurs.That is, working as awARL (I1→I2When) > 1, I1→I2It is positive association rules,
Item collection I1And I2In, the appearance of a side can promote a possibility that another party occurs;As awARL (I1→I2When) < 1, I1→I2It is then
Negative customers rule, the appearance of a side can reduce a possibility that another party occurs;As awARL (I1→I2When)=1, item collection I1And I2
It is mutually indepedent, uncorrelated, correlation rule I at this time1→I2It is false rule.It can easily be proven that awARL (I1→I2) with as follows
Property 1.
Property 12. awARL (﹁ I1→I2)<1; 5. awARL (﹁ I1→I2)>1;⑥awARL
(﹁ I1→ ﹁ I2)<1。
According to property 1, as awARL (I1→I2When) > 1, weighting positive association rules I can be excavated1→I2.As awARL (I1
→I2When) < 1, weighting negative customers rule I can be excavated1→ ﹁ I2With ﹁ I1→I2。
Assuming that minimum weight confidence threshold value is mc, binding property 1 provides the strong positive and negative correlation rule of weighting and is defined as follows:
For weighting positive item collection (I1∪I2), if awARL (I1→I2) > 1, and awARConf (I1→I2) >=mc, then weight
Correlation rule I1→I2It is Strong association rule.
For negative dependent (I1∪I2), if awARL (I1→I2) < 1, and awARConf (I1→ ﹁ I2) >=mc, awARConf
(﹁ I1→I2) >=mc, then I1→ ﹁ I2With ﹁ I1→I2It is strong negative customers rule.
A kind of across language inquiry extended method based on weighting positive and negative regular former piece and relevant feedback of the present invention, including it is as follows
Step:
Source language query is translated as object language using machine translation system by 1.1 to be inquired;
The machine translation system may is that Microsoft must answer machine translation interface Microsoft Translator
API, Google's machine translation interface, etc..
1.2 object language query and search object language original document collection obtain object language initial survey document, specifically used
Retrieval model is the classical retrieval model based on vector space model.
1.3 building object language initial survey set of relevant documents: it is related that forefront n object language initial survey document is subjected to user
Property judge to obtain initial survey relevant documentation, thus construct object language initial survey set of relevant documents;
1.4 pairs of object language initial survey set of relevant documents excavate weighted frequent items and negative dependent containing former inquiry lexical item;
Specific steps:
1.4.1 object language initial survey set of relevant documents is pre-processed, constructs document index library and total characteristic dictionary;
Pre-treatment step is:
(1) it is Chinese for object language, then carries out Chinese word segmentation, remove stop words, extracts Chinese Feature Words, Chinese point
Word program develops the Chinese lexical analysis system ICTCLAS write using Inst. of Computing Techn. Academia Sinica;For target
Language is English, then (sees network address in detail: http://tartarus.org/~martin/ using Porter program
PorterStemmer stem extraction) is carried out, English stop words is removed;
(2) Feature Words weight is calculated, Feature Words weight shows the specific word for the significance level of document where it, this hair
The bright tf-idf Feature Words weight ws with prevalence using classicsijCalculation method.The wijShown in calculation formula such as formula (23):
Wherein, wijIndicate document diMiddle Feature Words tjWeight, tfj,iIndicate Feature Words tjIn document diIn go out occurrence
Number, dfjIt indicates to contain Feature Words tjNumber of documents, N indicates total number of documents in collection of document.
(3) document index library and total characteristic dictionary are constructed.
1.4.2 Mining Frequent 1_ item collection L1: Feature Words candidate's 1_ item collection C is obtained from total feature dictionary1, calculate 1_
Collect C1Support awSup (C1), if awSup (C1) >=support threshold ms, then candidate's 1_ item collection C1For frequent 1_ item collection L1,
And by L1It is added to weighted frequent items set PIS;AwSup (the C1) shown in calculation formula such as formula (24):
Wherein, n and W be respectively in object language initial survey set of relevant documents the total record of document and all Feature Words weights it is total
With,For C1The frequency occurred is concentrated in object language initial survey relevant documentation,For C1In object language initial survey set of relevant documents
In item centralized value, β ∈ (0,1) is adjustment factor, and value cannot take 0 or 1.
1.4.3 the frequent k_ item collection L of weighting containing inquiry lexical item is excavatedkWith negative k_ item collection Nk, k >=2.
Specific steps:
(1) candidate's k_ item collection C is excavatedk: pass through frequent (k-1) _ item collection Lk-1It carries out Aproiri connection and obtains;
Aproiri connection method is detailed in document: Agrawal R, Imielinski T, Swami A.Mining
association rules between sets of items in large database[C]//Proceedings of
the 1993ACM SIGMOD International Conference on Management of Data,Washington
D C,USA,1993:207-216。
(2) as k=2, the candidate 2_ item collection C without inquiry lexical item is wiped out2, retain candidate 2_ containing inquiry lexical item
Collect C2。
(3) candidate's k_ item collection C is calculatedkSupport awSup (Ck):
If awSup (Ck) >=support threshold ms, then calculate CkWeighted frequent items degree of association awPIR (Ck), if
awPIR(Ck) >=frequent item set degree of association threshold value minPR, then k_ candidate CkTo weight frequent k_ item collection Lk, it is added to weighting
Frequent item set set PIS;
If awSup (Ck) < ms then calculates weighting negative dependent degree of association awNIR (Ck), if awNIR (Ck) >=negative dependent closes
Connection degree threshold value minNR, then, CkTo weight negative k_ item collection Nk, and it is added to weighting negative dependent set NIS.The awSup
(Ck) shown in calculation formula such as formula (25):
Wherein,For CkThe frequency occurred is concentrated in object language initial survey relevant documentation,For CkIn object language initial survey
The item centralized value that relevant documentation is concentrated, k CkNumber of items.
awPIR(Ck) calculation formula in two kinds of situation: the situation of m=2 and m > 2, i.e., as shown in formula (26) and formula (27),
Wherein, candidate to weight positive item collection Ck=(t1,t2,…,tm), m >=2, tmax(1≤max≤m) is CkAll items
In the maximum individual event mesh of its support, IqFor CkAll 2_ Son item sets into (m-1) _ Son item set the maximum subitem of its support
Collection.
awNIR(Ck) calculation formula in two kinds of situation: the situation of r=2 and r > 2, i.e., as shown in formula (28) and formula (29),
Wherein, candidate weighting negative dependent Ck=(t1,t2,…,tr), r >=2, tmax(1≤max≤r) is CkAll items
In the maximum individual event mesh of its support, IpFor CkAll 2_ Son item sets into (r-1) _ Son item set the maximum subitem of its support
Collection.
(4) if k_ item collection LkFor empty set, then item set mining terminates, and goes to step 1.5, otherwise, goes to step (1), continues
It excavates.
1.5 excavate the strong positive association rules of weighting from weighted frequent items set PIS: for Feature Words weighted frequent items
The frequent k_ item collection L of each in set PISk, L is excavated in k >=2kMiddle former piece is expansion word item collection I and consequent is query word
The union of the correlation rule I → qt, the qt and I of item collection qt are Lk, the intersection of qt and I are empty set, and qt is query word item collection,
I is expansion word item collection, and specific excavation step is as follows:
(1) positive item collection L is found outkAll proper subclass, obtain LkProper subclass item collection set;
(2) from LkTwo Son item sets qt and I are arbitrarily taken out in proper subclass set, andQt ∪ I=Lk,
(3) weighted association rules I → qt confidence level awARConf (I → qt) and its promotion degree awARL (I → qt) are calculated.
If awARL (I → qt) > 1, and awARConf (I → qt) >=minimum weight confidence threshold value mc, then obtain weighting strong association rule
Then I → qt, and it is added to the strong positive association rules set PAR of weighting.The awARConf's (I → qt) and awARL (I → qt)
Shown in calculation formula such as formula (30) and formula (31):
(4) return step (2) step sequentially carries out again, until LkEach proper subclass is and if only if quilt in proper subclass item collection set
It takes out once, then retrieves new positive item collection L from PIS setk, it is transferred to step (1) step and carries out new round weighted association
Rule digging, until the positive item collection L of each in PISkUntil all having been taken out, it is at this moment transferred to step 1.6.
1.6 is regular from the strong negative customers of weighting are excavated in negative dependent set NIS: negative for each in negative dependent set NIS
Item collection Nk, k >=2, excavate NkMiddle former piece is query word item collection qt and consequent is the weighting negative customers rule I of negative expansion word item collection I
The union of → ﹁ qt and ﹁ I → qt, the qt and I are Nk, the intersection of qt and I are empty set, and specific excavation step is as follows:
(1) negative dependent N is found outkAll proper subclass, obtain NkProper subclass set.
(2) from NkTwo Son item sets qt and I are arbitrarily taken out in proper subclass set, andQt ∪ I=Nk,
(3) promotion degree awARL (I → qt) is calculated, if awARL (I → qt) < 1:
It calculates negative customers rule I → ﹁ qt confidence level awARConf (I → ﹁ qt), if awARConf (I → ﹁ qt) >=most
Small weighting confidence threshold value mc then obtains weighting strong negative customers rule I → ﹁ qt, and is added to the strong negative customers regular collection of weighting
NAR;
It calculates negative customers rule ﹁ I → qt confidence level awARConf (﹁ I → qt), if awARConf (﹁ I → qt) >=
Mc is then obtained weighting strong negative customers rule ﹁ I → qt, and is added to NAR.The awARConf (I → ﹁ qt) and
Shown in the calculation formula such as formula (32) and formula (33) of awARConf (﹁ I → qt):
AwARConf (I → ﹁ qt)=1-awARConf (I → qt) (32)
(4) sequence executes return step (2) again, until NkEach proper subclass is and if only if being removed one in proper subclass set
Until secondary, it is at this moment transferred to step (5);
(5) new negative dependent N is retrieved from NIS setk, it is transferred to step (1) and carries out new round weighting negative customers rule
It then excavates, if each negative dependent is primary and if only if having been taken out in NIS set, weights strong negative customers rule digging
Terminate, is transferred to step 1.7.
1.7 extract the weighting positive association rule that its consequent is inquiry lexical item from the strong positive association rules set PAR of weighting
Then mode I → qt constructs candidate former piece and extends dictionary using the positive association rules former piece Feature Words as candidate expansion word.
1.8 extract the weighting negative customers rule that its consequent is inquiry lexical item from the strong negative customers regular collection NAR of weighting
Then mode I → ﹁ qt and ﹁ I → qt constructs the negative extension dictionary of former piece using negative customers rule former piece I as the negative expansion word of former piece.
The 1.9 negative expansion for each candidate former piece expansion word in candidate former piece extension dictionary, with the negative extension dictionary of former piece
Exhibition word compares, and candidate expansion word identical with negative expansion word is deleted in candidate former piece extension dictionary, and candidate former piece extends dictionary
In remaining candidate former piece expansion word be final former piece expansion word.
2.0 final former piece expansion words are that new inquiry is retrieved again with object language original inquiry word combination, and realization is looked into across language
Rear former piece extension is translated in inquiry.
Experimental design and result:
In order to illustrate the validity of the method for the present invention, is carried out using Vietnamese and English as language object and be based on the method for the present invention
With the more English cross-language information retrieval experiment of control methods.
Experimental data set:
Select the English text data set of NTCIR-5CLIR as testing corpus herein.The corpus is Japan Information information
The cross-language information retrieval normal data testing material in the evaluation and test meeting of the multi-lingual processing world that research institute sponsors, derives from
Mainichi Daily News news media 2000,2001 (abbreviation mdn00, mdn01) and Korea Times2001 (letter
Claim ktn01) newsletter archive, (i.e. mdn00 has 6608 to totally 26224 English text information, and mdn01 has 5547, ktn01
There are 14069).The data set has wen chang qiao district collection, result set and query set, and result set has Rigid standard (i.e. highly with inquiry
It is correlation, related) and two kinds of Relax standard (i.e. to inquire highly relevant, related and part related) etc., query set is looked into including 50
Theme is ask, there are 4 kinds of four versions such as Japanese, Korean, Chinese and English and TITLE, DESC, NARR and CONC etc. inquiries respectively
Type of theme, TITLE query type briefly describes inquiry theme with noun and nominal phrase, belongs to short inquiry, and DESC is looked into
It askes type and inquiry theme inquiry is briefly described with sentential form, belong to long inquiry.Herein using TITLE and DESC query type into
Row retrieval experiment.
The present invention experiment in, due to NTCIR-5CLIR corpus do not provide Vietnamese inquiry version, still spy please translate
50 Chinese edition inquiry theme corpus human translations in NTCIR-5CLIR are Vietnamese by Association of South-east Asian Nations, mechanism language Professional translator
Inquiry is as the source language query tested herein.
Compare pedestal method:
(1) English cross-language retrieval (Vietnamese-English Cross-Language Retrieval, VECLR) is got over
Pedestal method: refer to more English across language retrieve for the first time as a result, i.e. by the inquiry of original language Vietnamese after machine translation is English
The search result that retrieval English document obtains, without using query expansion technology in retrieving.
(2) (Query Post-Translation Expansion Based is extended after translating based on pseudo-linear filter inquiry
On Pseudo Relevance Feedback, QPTE_PRF) more English cross-language retrieval method: QPTE_PRF benchmark algorithm is
Based on document (across language inquiry extension [J] the information journal of Wu Dan, He great Qing, Wang Huilin based on spurious correlation, 2010,29
(2): 232-239. across language inquiry extended method) realizes the search result got over and extended after English is translated across language inquiry.It is tested
Method and parameter: original language Vietnamese inquiry machine is translated as English query and search English document, extracts across language initial survey forefront
20 building initial survey English set of relevant documents of English document extract English feature lexical item and calculate its weight, arrange by weight descending
Column more extend after English is translated across language inquiry using 20, forefront feature lexical item as English expansion word realization.
Using R- precision ratio (R-Prec) and P@5 as cross-language retrieval evaluation index of the invention.R- precision ratio refers to
The precision ratio calculated after R document is retrieved, wherein R refers to corresponding to some inquiry relevant documentation in collection of document
Number does not emphasize that document results concentrate the ordering scenario of document.
Experimental result is as follows:
The source program for having write the method for the present invention and pedestal method by experimental analysis and compares the method for the present invention and comparison
The more English cross-language information retrieval performance of pedestal method carries out more across the language letter of English to 50 Vietnamese TITLE and DESC inquiries
Breath retrieval obtains initial survey user relevant feedback text after carrying out End-user relevance judgement to 50, across language initial survey forefront English document
The relevant documentation concentrated in the document of 50, initial survey forefront containing known results (for simplicity, herein in experiment, is considered as initial survey by shelves
Relevant documentation) it is tested, the average value of the R-Prec and P@5 of more English cross-language retrieval result is obtained, respectively such as table 1 to table 2
Shown, public experiment parameter is provided that α=0.3, minPR=0.1, minNR=0.01, excavates and arrives 3_ item collection.
1 this paper inventive method of table is compared with the retrieval performance of comparison pedestal method (TITLE inquiry)
This table experiment parameter: mc=0.8, ms ∈ { 0.2,0.25,0.3,0.35,0.4,0.45 } (mdn00), ms ∈
{ 0.2,0.23,0.25,0.28,0.3 } (mdn01 and ktn01)
Table 1 the experimental results showed that, compared with comparing pedestal method VECLR and QPTE_PRF pedestal method, the method for the present invention
More 5 value of R-Prec and P of English cross-language retrieval result of TITLE query type be greatly improved, than the side VECLR
The increase rate maximum of method can achieve 91.28%, and the increase rate than QPTE_PRF pedestal method is up to
265.88%.
2 this paper inventive method of table (DESC inquiry) compared with the retrieval performance of pedestal method
This table experiment parameter: mc=0.8, ms ∈ { 0.2,0.23,0.25,0.28,0.3 }
From 2 experimental result of table it is found that the R- of the more English cross-language retrieval result of the DESC query type of the method for the present invention
Than pedestal method VECLR and QPTE_PRF there has also been biggish raising, maximum increase rate is respectively 5 value of Prec and P@
137.38% and 238.75%.
The experimental results showed that the method for the present invention is effectively, to improve cross-language information retrieval performance really.
Claims (1)
1. a kind of based on across the language inquiry extended method for weighting positive and negative regular former piece and relevant feedback, which is characterized in that including
Following steps:
Source language query is translated as object language using machine translation system by 1.1 to be inquired;
1.2 object language query and search object language original document collection obtain object language initial survey document;
1.3 building object language initial survey set of relevant documents: forefront n object language initial survey document is subjected to End-user relevance
Judgement obtains initial survey relevant documentation, thus constructs object language initial survey set of relevant documents;
1.4 pairs of object language initial survey set of relevant documents excavate weighted frequent items and negative dependent containing former inquiry lexical item;
Specific steps:
1.4.1 object language initial survey set of relevant documents is pre-processed, constructs document index library and total characteristic dictionary;
1.4.2 Mining Frequent 1_ item collection L1:
Feature Words candidate's 1_ item collection C is obtained from total feature dictionary1, calculate 1_ item collection C1Support awSup (C1), if
awSup(C1) >=support threshold ms, then candidate's 1_ item collection C1For frequent 1_ item collection L1, and by L1It is added to weighted frequent items
Set PIS;AwSup (the C1) calculation formula is as follows:
Wherein, n and W is the summation of the total record of document and all Feature Words weights in difference object language initial survey set of relevant documents,For C1The frequency occurred is concentrated in object language initial survey relevant documentation,For C1It is concentrated in object language initial survey relevant documentation
Item centralized value, β ∈ (0,1) is adjustment factor, and value cannot take 0 or 1;
1.4.3 the frequent k_ item collection L containing inquiry lexical item is excavatedkWith negative k_ item collection Nk, k >=2
Specific steps:
(1) candidate's k_ item collection C is excavatedk: pass through frequent (k-1) _ item collection Lk-1It carries out Aproiri connection and obtains;
(2) as k=2, the candidate 2_ item collection C without inquiry lexical item is wiped out2, retain the candidate 2_ item collection C containing inquiry lexical item2;
(3) candidate's k_ item collection C is calculatedkSupport awSup (Ck):
If awSup (Ck) >=support threshold ms, then calculate CkWeighted frequent items degree of association awPIR (Ck), if awPIR
(Ck) >=frequent item set degree of association threshold value minPR, then k_ candidate CkTo weight frequent k_ item collection Lk, it is frequent to be added to weighting
Item collection set PIS;
If awSup (Ck) < ms then calculates weighting negative dependent degree of association awNIR (Ck), if awNIR (Ck) >=negative dependent the degree of association
Threshold value minNR, then, CkTo weight negative k_ item collection Nk, and it is added to weighting negative dependent set NIS;AwSup (the Ck) meter
It is as follows to calculate formula:
Wherein,For CkThe frequency occurred is concentrated in object language initial survey relevant documentation,For CkIt is related in object language initial survey
Item centralized value in document sets, k CkNumber of items;
awPIR(Ck) calculation formula in two kinds of situation: the situation of m=2 and m > 2, that is,
Wherein, candidate to weight positive item collection Ck=(t1,t2,…,tm), m >=2, tmax(1≤max≤m) is CkAll items in its
The maximum individual event mesh of support, IqFor CkAll 2_ Son item sets into (m-1) _ Son item set the maximum Son item set of its support;
awNIR(Ck) calculation formula in two kinds of situation: the situation of r=2 and r > 2, that is,
Wherein, candidate weighting negative dependent Ck=(t1,t2,…,tr), r >=2, tmax(1≤max≤r) is CkAll items in its
The maximum individual event mesh of support, IpFor CkAll 2_ Son item sets into (r-1) _ Son item set the maximum Son item set of its support;
(4) if k_ item collection LkFor empty set, then item set mining terminates, and goes to step 1.5, otherwise, goes to step (1), continues to excavate;
1.5 excavate the strong positive association rules of weighting from weighted frequent items set PIS: for Feature Words weighted frequent items collection
Close each frequent k_ item collection L in PISk, L is excavated in k >=2kMiddle former piece is expansion word item collection I and consequent is inquiry lexical item
The union for collecting the correlation rule I → qt, the qt and I of qt is Lk, the intersection of qt and I are empty set, and qt is query word item collection, I
For expansion word item collection, specific excavation step is as follows:
(1) positive item collection L is found outkAll proper subclass, obtain LkProper subclass item collection set;
(2) from LkTwo Son item sets qt and I are arbitrarily taken out in proper subclass set, andQt ∪ I=Lk,
(3) weighted association rules I → qt confidence level awARConf (I → qt) and its promotion degree awARL (I → qt) are calculated;If
AwARL (I → qt) > 1, and awARConf (I → qt) >=minimum weight confidence threshold value mc then obtain weighting Strong association rule I
→ qt, and it is added to the strong positive association rules set PAR of weighting;The meter of the awARConf (I → qt) and awARL (I → qt)
It is as follows to calculate formula:
(4) sequence carries out return step (2) again, until LkEach proper subclass is and if only if being removed one in proper subclass item collection set
It is secondary, then new positive item collection L is retrieved from PIS setk, it is transferred to step (1) and carries out the excavation of new round weighted association rules,
Until the positive item collection L of each in PISkUntil all having been taken out, it is at this moment transferred to step 1.6;
1.6 is regular from the strong negative customers of weighting are excavated in negative dependent set NIS: for each negative dependent in negative dependent set NIS
Nk, k >=2, excavate NkMiddle former piece is query word item collection qt and consequent is the weighting negative customers rule of negative expansion word item collection IWithThe union of the qt and I are Nk, the intersection of qt and I are empty set, and specific excavation step is as follows:
(1) negative dependent N is found outkAll proper subclass, obtain NkProper subclass set;
(2) from NkTwo Son item sets qt and I are arbitrarily taken out in proper subclass set, andQt ∪ I=Nk,Wherein
Qt is inquiry item collection;
(3) promotion degree awARL (I → qt) is calculated, if awARL (I → qt) < 1:
Calculate negative customers ruleConfidence levelIfawARConf (I→ ﹁qt) >=is minimum to be added
Weigh confidence threshold valueMc,It then obtains weighting strong negative customers ruleAnd it is added to the strong negative customers regular collection NAR of weighting;
Calculate negative customers ruleConfidence levelIfThen
It is regular to strong negative customers are weightedAnd it is added to NAR;DescribedWithCalculation formula it is as follows:
(4) sequence executes return step (2) again, until NkEach proper subclass is once and if only if being removed in proper subclass set
Only, step (5) are at this moment transferred to;
(5) new negative dependent N is retrieved from NIS setk, it is transferred to step (1) and carries out new round weighting negative customers rule digging
Pick weights strong negative customers rule digging knot if each negative dependent is primary and if only if having been taken out in NIS set
Beam is transferred to step 1.7;
1.7 extract the weighting positive association rules mould that its consequent is inquiry lexical item from the strong positive association rules set PAR of weighting
Formulas I → qt constructs candidate former piece and extends dictionary using the positive association rules former piece Feature Words as candidate expansion word;
1.8 extract the weighting negative customers rule mould that its consequent is inquiry lexical item from the strong negative customers regular collection NAR of weighting
Formula WithUsing negative customers rule former piece I as the negative expansion word of former piece, the negative extension dictionary of former piece is constructed;
The 1.9 negative expansion word for each candidate former piece expansion word in candidate former piece extension dictionary, with the negative extension dictionary of former piece
Compare, candidate expansion word identical with negative expansion word is deleted in candidate former piece extension dictionary, candidate former piece extends remaining in dictionary
Under candidate former piece expansion word be final former piece expansion word;
2.0 final former piece expansion words are that new inquiry is retrieved again with object language original inquiry word combination, and realization is translated across language inquiry
Former piece extends afterwards.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710807540.4A CN107609095B (en) | 2017-09-08 | 2017-09-08 | Based on across the language inquiry extended method for weighting positive and negative regular former piece and relevant feedback |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710807540.4A CN107609095B (en) | 2017-09-08 | 2017-09-08 | Based on across the language inquiry extended method for weighting positive and negative regular former piece and relevant feedback |
Publications (2)
Publication Number | Publication Date |
---|---|
CN107609095A CN107609095A (en) | 2018-01-19 |
CN107609095B true CN107609095B (en) | 2019-07-09 |
Family
ID=61062737
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201710807540.4A Expired - Fee Related CN107609095B (en) | 2017-09-08 | 2017-09-08 | Based on across the language inquiry extended method for weighting positive and negative regular former piece and relevant feedback |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN107609095B (en) |
Families Citing this family (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109299278B (en) * | 2018-11-26 | 2022-02-15 | 广西财经学院 | Text retrieval method based on confidence coefficient-correlation coefficient framework mining rule antecedent |
CN109299292B (en) * | 2018-11-26 | 2022-02-15 | 广西财经学院 | Text retrieval method based on matrix weighted association rule front and back part mixed expansion |
CN109684464B (en) * | 2018-12-30 | 2021-06-04 | 广西财经学院 | Cross-language query expansion method for realizing rule back-part mining through weight comparison |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104216874A (en) * | 2014-09-22 | 2014-12-17 | 广西教育学院 | Chinese interword weighing positive and negative mode excavation method and system based on relevant coefficients |
CN105095512A (en) * | 2015-09-09 | 2015-11-25 | 四川省科技交流中心 | Cross-language private data retrieval system and method based on bridge language |
CN106557478A (en) * | 2015-09-25 | 2017-04-05 | 四川省科技交流中心 | Distributed across languages searching systems and its search method based on bridge language |
-
2017
- 2017-09-08 CN CN201710807540.4A patent/CN107609095B/en not_active Expired - Fee Related
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104216874A (en) * | 2014-09-22 | 2014-12-17 | 广西教育学院 | Chinese interword weighing positive and negative mode excavation method and system based on relevant coefficients |
CN105095512A (en) * | 2015-09-09 | 2015-11-25 | 四川省科技交流中心 | Cross-language private data retrieval system and method based on bridge language |
CN106557478A (en) * | 2015-09-25 | 2017-04-05 | 四川省科技交流中心 | Distributed across languages searching systems and its search method based on bridge language |
Non-Patent Citations (2)
Title |
---|
完全加权正负关联规则挖掘及其在教育数据中的应用;余如 等;《中文信息学报》;20141231;第28卷(第4期);全文 |
有效的矩阵加权正负关联规则挖掘算法——MWARM-SRCCCI;周秀梅 等;《计算机应用》;20141231;第34卷(第10期);全文 |
Also Published As
Publication number | Publication date |
---|---|
CN107609095A (en) | 2018-01-19 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN107609095B (en) | Based on across the language inquiry extended method for weighting positive and negative regular former piece and relevant feedback | |
CN106372241B (en) | More across the language text search method of English and the system of word-based weighted association pattern | |
CN106484781B (en) | Merge the Indonesia's Chinese cross-language retrieval method and system of association mode and user feedback | |
CN109299278A (en) | Based on confidence level-related coefficient frame mining rule former piece text searching method | |
CN107526839B (en) | Consequent extended method is translated across language inquiry based on weight positive negative mode completely | |
CN109684463B (en) | Cross-language post-translation and front-part extension method based on weight comparison and mining | |
CN109726263B (en) | Cross-language post-translation hybrid expansion method based on feature word weighted association pattern mining | |
CN109739953B (en) | Text retrieval method based on chi-square analysis-confidence framework and back-part expansion | |
CN109739952A (en) | Merge the mode excavation of the degree of association and chi-square value and the cross-language retrieval method of extension | |
CN107562904B (en) | Positive and negative association mode method for digging is weighted between fusion item weight and the English words of frequency | |
CN109299292A (en) | Text searching method based on the mixing extension of matrix weights correlation rule front and back pieces | |
CN109684464B (en) | Cross-language query expansion method for realizing rule back-part mining through weight comparison | |
CN108170778B (en) | Chinese-English cross-language query post-translation expansion method based on fully weighted rule post-piece | |
Wloka | Identifying bilingual topics in wikipedia for efficient parallel corpus extraction and building domain-specific glossaries for the japanese-english language pair | |
CN109753559A (en) | Across the language text search method with consequent extension is excavated based on RCSAC frame | |
Rao et al. | Term weighting schemes for emerging event detection | |
CN106383883B (en) | Indonesia's Chinese cross-language retrieval method and system based on matrix weights association mode | |
CN108133022B (en) | Matrix weighting association rule-based Chinese-English cross-language query front piece expansion method | |
Zhang et al. | Topic level disambiguation for weak queries | |
Li et al. | Keyword extraction based on lexical chains and word co-occurrence for Chinese news web pages | |
Cagliero et al. | Cross-lingual timeline summarization | |
CN109543196A (en) | Indonesia-the English excavated based on weighting pattern translates rear former piece extended method across language | |
Caon et al. | Finding synonyms and other semantically-similar terms from coselection data | |
Holzmann et al. | Named entity evolution recognition on the Blogosphere | |
Yan et al. | Terminology extraction in the field of water environment based on rules and statistics |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant | ||
CF01 | Termination of patent right due to non-payment of annual fee | ||
CF01 | Termination of patent right due to non-payment of annual fee |
Granted publication date: 20190709 Termination date: 20200908 |