CN102117284A - Method for retrieving cross-language knowledge - Google Patents

Method for retrieving cross-language knowledge Download PDF

Info

Publication number
CN102117284A
CN102117284A CN2009102439934A CN200910243993A CN102117284A CN 102117284 A CN102117284 A CN 102117284A CN 2009102439934 A CN2009102439934 A CN 2009102439934A CN 200910243993 A CN200910243993 A CN 200910243993A CN 102117284 A CN102117284 A CN 102117284A
Authority
CN
China
Prior art keywords
verb
search index
language
language search
target
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN2009102439934A
Other languages
Chinese (zh)
Inventor
高建忠
赵琦
吴祖林
邱李豪
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
PERA GLOBAL TECHNOLOGY (BEIJING) Co Ltd
Original Assignee
PERA GLOBAL TECHNOLOGY (BEIJING) Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by PERA GLOBAL TECHNOLOGY (BEIJING) Co Ltd filed Critical PERA GLOBAL TECHNOLOGY (BEIJING) Co Ltd
Priority to CN2009102439934A priority Critical patent/CN102117284A/en
Publication of CN102117284A publication Critical patent/CN102117284A/en
Pending legal-status Critical Current

Links

Images

Landscapes

  • Machine Translation (AREA)

Abstract

The invention provides a method for retrieving cross-language knowledge, which comprises the following steps: 10) semantically analyzing an original language retrieval mode, thereby acquiring an original language retrieval index which has a 'verb + object' structure formed by the verb-object construction of the original language retrieval mode; 20) translating the original language retrieval index into a target language retrieval index; and 30) matching the target language retrieval index with the original language retrieval index, wherein the target language retrieval index has a 'verb + object' structure formed by the verb-object construction of a target language file bank, acquired by semantically analyzing the target language file bank by using the target language retrieval index. By using the method, the cross-language knowledge can be retrieved efficiently and accurately.

Description

A kind of method of striding the linguistry retrieval
Technical field
The present invention relates to the computer search field, particularly a kind of method of striding the linguistry retrieval.
Background technology
Along with the development of infotech, people utilize the mode of retrieve electronic document to obtain knowledge more and more at large.But the required knowledge of user may be present in the document of different language, and the user more is ready to exchange with mother tongue with electronic system.This has just produced the demand of striding the linguistry retrieval and extracting.
Cross-language retrieval refers to the document that the user uses the retrieval vocabulary retrieval of certain natural language (source language) to be expressed by another kind of natural language (target language).It allows the language construct retrieval question-type of user to be familiar with, and uses this question-type to retrieve the document that any is write as with non-question-type language then.
The common method that realizes cross-language retrieval has: document interpretation method and question-type interpretation method etc.
The document interpretation method was converted into the information-oriented language (target language) of document and puts question to language (source language) before information retrieval.The advantage of this method is, with source (enquirement) language description, the user can select utilization easily by striding of realizing of the document interpretation method result for retrieval that the linguistry retrieval returns to the user; For the translation of document level, its linguistic context is more wide in range, can utilize context to eliminate the ambiguousness of translation.But all information that are retrieved of document translation brief change language, and the accuracy of existing most of machine translation systems also is difficult to reach satisfactory degree, can't reach realistic scale; And whole documents in the database to be translated source language from target language, required workload is huge, costs dearly.It is also not little to re-construct its cost of index data that is translated on a large scale in addition.So the document interpretation method is only just meaningful under the information content condition of limited that is retrieved.Present this method all can not show a candle to the question-type interpretation method in research and practicality.
The question-type interpretation method is translated as every kind of language that searching system is supported with the question-type of user's input, then multilingual question-type is submitted to the matching module of searching system, retrieves the document of corresponding language.It is the method that realizes that at present cross-language retrieval is the most commonly used.Its advantage is only question-type to be translated, the translation amount little and the translation can carry out fast; Major defect is: 1, because the retrieval result that returns describes with target language, increased the difficulty that the user utilizes institute's acquired information; 2, question-type is very short usually, language ambience information seldom is difficult to disambiguation, and each question term is substituted by its all possible translation, the translation fuzzy problem is serious, and therefore the ambiguity of control translation is a key issue of the effective question-type interpretation method of design.
Question-type translation can be by based on dictionary method, wait and realized based on corpus-based, dictionary one corpus mixed method.In the question-type interpretation method, just the keyword of user's question-type is simply translated usually based on the question-type interpretation method of dictionary, can't be according to the disambiguation of question-type linguistic context, the result for retrieval precision ratio of acquisition is lower.Can from corpus, obtain the translation of some phrase in the question-type or short sentence based on the question-type interpretation method of corpus, can eliminate the part ambiguity, but limit by corpus scale and content, often can only obtain one or more translations of question-type keyword, can't obtain the result for retrieval of keyword synonym, recall ratio is lower.
Summary of the invention
The technical problem to be solved in the present invention is to improve the precision ratio of striding the linguistry retrieval.
For addressing the above problem, a kind of method of striding the linguistry retrieval is provided according to an aspect of the present invention, comprise the following steps:
10) the source language retrieval type is carried out semantic analysis, obtain the source language search index, " verb+object " that the V-O construction that wherein said source language search index is described source language retrieval type constitutes;
20) described source language search index is translated as the target language search index;
30) with target document index and described target language search index coupling, " verb+object " that wherein said target language search index constitutes for the V-O construction of the target document storehouse being carried out in the described target document storehouse that semantic analysis obtained.
In the said method, after the described step 10), also comprise the following steps:
11) described source language search index is carried out the synonym expansion.
In the said method, also comprise the following steps: after the described step 11)
12) the described source language search index of checking.
In the said method, described step 20) be to utilize " verb+object " bilingual dictionary, wherein, described " verb+object " bilingual dictionary comprises source language " verb+object " and corresponding target language " verb+object ".
In the said method, described step 20) if in do not comprise described target language search index in described " verb+object " bilingual dictionary, then comprise the following steps: to utilize verb bilingual dictionary and noun bilingual dictionary that described source language search index is translated as the target language search index.
In the said method, described step 20) be to utilize verb bilingual dictionary and noun bilingual dictionary.
In the said method, described step 20) after, also comprise the following steps:
21) described target language search index is carried out the synonym expansion.
In the said method, described step 21) also comprise step after:
22) the described target language search index of checking.
Beneficial effect of the present invention be to provide a kind of precision ratio higher stride the linguistry search method, in addition, the present invention has also effectively improved the recall ratio of striding the linguistry retrieval.
Description of drawings
Fig. 1 be according to the present invention a specific embodiment stride linguistry search method process flow diagram;
Fig. 2 is that the bilingual dictionary of the specific embodiment according to the present invention is set up process flow diagram.
Embodiment
In order to make purpose of the present invention, technical scheme and advantage clearer,, the method for striding the linguistry retrieval according to the specific embodiment of the invention is further described below in conjunction with accompanying drawing.Should be appreciated that specific embodiment described herein only in order to explanation the present invention, and be not used in qualification the present invention.
What Fig. 1 showed according to a present invention specific embodiment strides linguistry search method process flow diagram, and this method comprises the following steps:
Source language retrieval type and target document storehouse are carried out semantic analysis, extracting V-O construction wherein, and then obtain source language search index and target document index.Usually, in short the V-O construction in is the nucleus in the sentence, can embody this purport content, as: " how improving indoor temperature? " in winter in V-O construction be " raising+temperature "; And " verb+object " in V-O construction combination exists certain language in conjunction with regularity; So extract " verb+object " combination (V-O construction) as index.
Selection utilizes the Stamford parser (Stanford Parser) of Stanford University to finish semantic analysis as semantic analyzer, and this instrument supports that at present detailed description is seen to the semantic analysis of English, Chinese, German and Arabic Http:// www-nlp.stanford.edu/software/lex-parser.shtmlOne of ordinary skill in the art will appreciate that semantic analysis can utilize a lot of existing semantic analyzer of natural language processing field to finish, it can support the semantic analysis of different language respectively.This step do not limit concrete semantic analyzer and at language.Two concrete semantic analysis examples are described below:
Example 1: the supposition source language is a Chinese, and the source language retrieval type is " how surveying microwave radiation ", and semantic analysis result is:
(ROOT
(IP
(VP
(ADVP (AD how))
(VP (VV detection)
(NP
(ADJP (JJ microwave))
(NP (NN radiation)))))))
Wherein, each English implication of writing a Chinese character in simplified form is as follows:
ROOT: root node;
IP: inflexional language prime phrase;
VP: verb phrase;
ADVP: adverbial phrase;
AD: adverbial word;
VV: verb;
NP: noun phrase;
ADJP: adjective phrase;
JJ: adjective;
NN: common noun.
According to the result queue of semantic analysis, extract V-O construction VP " detection microwave radiation " automatically, obtain the combination of verb VV " detection "+object NP " microwave radiation ", as the source language search index.Verb is labeled as V, and object tag is O, and promptly this source language search index is " surveying (V)+microwave radiation (O) ".
Example 2: the hypothetical target language is English, and a word in the target document storehouse is " Dopplereffect transducer measures fluid flow ", and semantic analysis result is:
(ROOT
(S
(NP(JJ?Doppler)(NN?effect)(NN?transducer))
(VP(VBZ?measures)
(NP(JJ?fluid)(NN?flow)))))
Wherein, each English implication of writing a Chinese character in simplified form is as follows:
ROOT: root node;
S: sentence;
NP; Noun phrase;
JJ: adjective;
NN: common noun;
VP: verb phrase;
VBZ: present tense verb.
According to the result queue of semantic analysis, extract V-O construction VP " measures fluidflow " automatically, obtain the combination of verb VBZ " measures "+object NP " fluid flow ", as the target document index.Verb is labeled as V, and object tag is O, and promptly this target document index is " measure (V)+fluid flow (O) ".
Preferably, the source language search index is carried out the synonym expansion automatically, more specifically, utilize the source language thesaurus that " verb (V) " in the source language search index and " object (O) " carried out the synonym expansion; And form extended source language retrieval index with " verb (V) " and " object (the O) " word after the expansion, thus obtaining extended source language retrieval index, i.e. expansion " verb (V)+object (O) " is made up.Wherein, the source language thesaurus comprises verb thesaurus and noun thesaurus, and the verb thesaurus can be chosen existing known dictionary, as " synonymicon commonly used " etc., by " verb synonym " structure " verb synonymicon " wherein; The noun thesaurus can be chosen existing known dictionary, as " synonymicon commonly used ", by " noun synonym " structure " noun thesaurus " wherein.Provided an example of the source language search index being carried out the synonym expansion below.
Example 3: the supposition source language is a Chinese, and a source language search index is " dilution (V)+photoresist (O) ".
In source language verb thesaurus, search the synonym of " dilution (V) ", do not obtain " dilution (V) " synonym; In source language object thesaurus, search the synonym of " photoresist (O) ", obtain synonym " photoresist (O) ".Therefore, the extended source language retrieval index of source language search index " dilution (V)+photoresist (O) " is: " dilution (V)+photoresist (O) ".
In this step, adopt method,, improved the recall ratio of cross-language retrieval to obtain how correct result for retrieval to the expansion of keyword combination carrying out synonym.One of ordinary skill in the art will appreciate that, also can not carry out this synonym spread step.
The above-mentioned step of utilizing dictionary to carry out synonym expansion and keyword combination may produce following mistake, the synonym of the synonym of certain " verb (V) " and certain " object (O) " unlikely occurs in language expression simultaneously, for example: " increasing (V)+heat (O) ", " increasing (V) " is the synonym of " increasing (V) ", but, the combination that " increases (V)+heat (O) " does not also meet language regulation, has irrationality.Therefore, according to a preferred embodiment of the invention, the present invention also comprises the step that the rationality of extended source language retrieval index is verified.
In that being verified, extended source language retrieval index can adopt the co-occurrence technology in this step.The co-occurrence technology is based on such hypothesis: when question term of translation, other question term (or their translation) just becomes " linguistic context " of the translation speech of selecting this question term.Correct translation is the common frequency height that occurs in the target language document, and wrong translation common frequency that occurs in the target language document is low.Therefore, when selecting correct translation for each question term, the translation of this speech and the translation of other question terms just can be selected when the co-occurrence degree is maximum in the target language document.The concrete operation of this process is as follows: to containing the set { S1 of n question term, ..., Sn}, at first provide each Si (translation set Ti of 1≤i≤n) according to dictionary, and then select from Ti that (1≤j≤n, the highest speech of translation set Tj co-occurrence rate of and j ≠ i) is translated as Si with other question terms Sj.Above-mentioned verification method is only considered the co-occurrence degree of " verb (V) " and " object (O) ", and is ignored other speech in the sentence, has effectively improved the execution efficient of this method.
According to a specific embodiment of the present invention, the step of the co-occurrence degree of calculating extended source language retrieval index is as follows:
In the source document storehouse, retrieve extended source language retrieval index, extract the document of " verb (V) " and " object (O) " that comprise simultaneously in the source document storehouse in the extended source language retrieval index.
If " verb " is expressed as v, " object " is expressed as o, the co-occurrence degree of an extended source language retrieval index in the source document storehouse be SIM (v, o), then computing formula is as follows:
SIM (v, o)=p (v, o) * log 2(p (v, o)/(p (v) * p (o)))-log 2(v, o) formula 1 for Dis
Wherein, (v), c (o) is the number of times that v, o occur in the source document storehouse to c, c (v, o) expression v and the o co-occurrence number of times in the same sentence in source document storehouse, p (v, o)=c (v, o)/c (v)+and c (v, o)/c (o), p (v)=and c (v)/∑ c is (v), (v o) is the mean distance between v and the o in to Dis, calculates with the two speech number.
Persons of ordinary skill in the art may appreciate that the co-occurrence degree that can also calculate extended source language retrieval index according to formula 2:
SIM ( v , o ) = ( c ( v , o ) c ( v ) + c ( v , o ) c ( o ) ) / 2 Formula 2
Usually, SIM (v, o) value less than 2 think this extended source language retrieval index by the checking; (v, o) value is deleted greater than 2 extended source language retrieval index for the SIM that obtains.
To verify that extended source language retrieval index is translated as the target language search index.Preferably, utilization " verb+object " bilingual dictionary and verified in the extended source language retrieval index " verb (V)+" object (O) " mates; wherein, should " verb+object " bilingual dictionary comprise the target language " verb+object " of source language " verb+object " and correspondence.It is Chinese and target language is the partial content of English " verb+object " bilingual dictionary that table 1 shows a source language.
Table 1 Chinese-English bilingual dictionary
Chinese English
Raising+temperature increase+temperature raise+temperature
Output+light signal output+light?signal output+optic?signal output+optical?signal
Fig. 2 shows the process flow diagram of foundation " verb+object " bilingual dictionary of the specific embodiment according to the present invention.The foundation of this dictionary is based on the use of Parallel Corpus, and wherein Parallel Corpus is a kind of bilingual or multi-lingual corpus, and source language text is promptly not only arranged in the storehouse, also has corresponding target language text.Two or more texts generally adopt sentence or paragraph alignment layout.Computing machine can and be translated Chinese language and originally carry out full-text search source Chinese language basis, and provides contrast to show.This process of setting up bilingual dictionary comprises the following steps: at first to handle two corpus T1 and T2 with semantic analyzer, wherein corpus T1 and T2 comprise the translation document that content is corresponding sentence by sentence, the language of a corpus T1 is s, and the language of another corpus T2 is t.Semantic analyzer is converted into corpus T1 and T2 the semantic indexing of being represented by some parallel " verb (V)+object (O) ".From the index of parallel " verb (V)+object (O) " expression, extract parallel " verb (V)+object (O) ", and it is right to set up bilingual " verb (V)+object (O) " speech, for example " heat (V)+water (O) " is parallel with " heating (V)+water (O) ", and it is right that the two comes together to set up a speech.The speech of being set up is to being edited and processed subsequently, and for example, repeating in the deletion lexical unit is right.The speech that editor finishes is to being added to " verb+object " bilingual dictionary.
This step is preferentially chosen the matching result of " verb+object " bilingual dictionary and is translated verifying the source language search index, if fail to obtain matching result, then utilize independent verb bilingual dictionary and noun bilingual dictionary to mate, obtain the target language search index verifying the source language search index.One of ordinary skill in the art will appreciate that, can certainly directly utilize independent verb bilingual dictionary and object bilingual dictionary to mate, obtain the target language search index verifying the source language search index.
From the above description as can be known, translation process of the present invention is not that each speech of user's request is simply translated, but some information word combination of user's request is translated, and has kept the part of speech mark and the semantic relation of user's request simultaneously.
According to a preferred embodiment of the invention, also comprise the step of utilizing target language synonym dictionary that the target language search index that is obtained is carried out the synonym expansion, wherein the target language thesaurus comprises verb thesaurus and noun thesaurus.Particularly, utilize target language verb thesaurus and noun thesaurus respectively " verb (V) " in the target language search index and " object (O) " to be carried out the synonym expansion; And, promptly obtain target language expansion " verb (V)+object (O) " combination with the composition of " verb (V) " and " object (the O) " word after expansion expansion target language search index.Provided an example that the target language search index is expanded below.
Example 4: the hypothetical target language is English, and a target language search index is " dissolve (V)+aluminum layer (O) ".
In target language verb thesaurus, search the synonym of " dissolve (V) ", obtain synonym " liquefy (V) "; In target language object thesaurus, search the synonym of " aluminum layer (O) ", obtain synonym " Al layer (O) ".Therefore, the expansion target language index of target language search index " dissolve (V)+aluminum layer (O) " is:
“liquefy(V)+aluminum?layer(O)”,
" dissolve (V)+Al layer (O) " and
“liquefy(V)+Al?layer(O)”。
Because in the question-type linguistic context, the translation of two incoherent question terms also may appear in the target corpus together, the result, and inappropriate translation may be chosen.This situation will have a strong impact on retrieval effectiveness.So, similar with the process that extended source language retrieval index is verified, expansion target language search index is verified, thereby obtained to satisfy comprehensive simultaneously and target language search index accuracy.
Coupling has been verified target language search index and target document index, and the document that obtains the match user retrieval type is as output.Particularly, in the target document storehouse, utilize the target document index to retrieve, in the text subclass that has the target document index retrieving out further retrieval ask relevant knowledge/document with the user, be searched targets Language Document index and the document of verifying that the target language search index is identical, and these documents are returned to the user as output.
Persons of ordinary skill in the art may appreciate that method of the present invention utilized the target document index, it is that the target document storehouse is carried out and the semantic analysis and obtaining similarly of source language retrieval type as mentioned above.If based on the above method, carry out other retrieving once more, the target document index that then can directly utilize above-mentioned steps and obtained, and needn't re-execute the target document storehouse step of semantic analysis once more.
In sum, the present invention as search index, can reduce the problem of the existing ambiguousness of the single keyword of translation with " verb+object " in retrieval type combination (V-O construction), improves the precision ratio of cross-language retrieval; Preferably, the method in conjunction with to the expansion of keyword combination carrying out synonym to obtain how correct result for retrieval, can improve the recall ratio of cross-language retrieval.
Should be noted that and understand, under the situation that does not break away from the desired the spirit and scope of the present invention of accompanying Claim, can make various modifications and improvement the present invention of foregoing detailed description.Therefore, the scope of claimed technical scheme is not subjected to the restriction of given any specific exemplary teachings.

Claims (10)

1. a method of striding the linguistry retrieval comprises the following steps:
10) the source language retrieval type is carried out semantic analysis, obtain the source language search index, " verb+object " that the V-O construction that wherein said source language search index is described source language retrieval type constitutes;
20) described source language search index is translated as the target language search index;
30) with target document index and described target language search index coupling, " verb+object " that wherein said target language search index constitutes for the V-O construction of the target document storehouse being carried out in the described target document storehouse that semantic analysis obtained.
2. method according to claim 1 is characterized in that, after the described step 10), also comprises the following steps:
11) described source language search index is carried out the synonym expansion.
3. method according to claim 2 is characterized in that, also comprises the following steps: after the described step 11)
12) the described source language search index of checking.
4. method according to claim 3 is characterized in that, described step 12) further comprises the co-occurrence degree that calculates verb and object in the described source language search index according to following formula,
SIM(v,o)=p(v,o)×log 2(p(v,o)/(p(v)×p(o)))-log 2Dis(v,o),
Wherein, verb list is shown v, and object representation is o, (v), c (o) is the number of times that v, o occur in the source document storehouse to c, c (v, o) expression v and the o co-occurrence number of times in the same sentence in source document storehouse, p (v, o)=c (v, o)/c (v)+c (v, o)/c (o), p (v)=and c (v)/∑ c is (v), (v o) is the mean distance between v and the o in to Dis.
5. method according to claim 3 is characterized in that, described step 12) further comprises the co-occurrence degree that calculates verb and object in the described source language search index according to following formula,
SIM ( v , o ) = ( c ( v , o ) c ( v ) + c ( v , o ) c ( o ) ) / 2 ,
Wherein, verb list is shown v, and object representation is o, and (v), c (o) is the number of times that v, o occur in the source document storehouse to c, c (v, o) expression v and the o co-occurrence number of times in the same sentence in source document storehouse.
6. according to each described method in the claim 1 to 5, it is characterized in that, described step 20) be to utilize " verb+object " bilingual dictionary, wherein, described " verb+object " bilingual dictionary comprises source language " verb+object " and corresponding target language " verb+object ".
7. method according to claim 6, it is characterized in that, described step 20) if in do not comprise described target language search index in described " verb+object " bilingual dictionary, then comprise the following steps: to utilize the just described source language search index of verb bilingual dictionary and noun bilingual dictionary to be translated as the target language search index.
8. according to each described method in the claim 1 to 5, it is characterized in that described step 20) be to utilize verb bilingual dictionary and noun bilingual dictionary.
9. according to each described method in the claim 1 to 5, it is characterized in that described step 20) after, also comprise the following steps:
21) described target language search index is carried out the synonym expansion.
10. method according to claim 9 is characterized in that, described step 21) after also comprise step:
22) the described target language search index of checking.
CN2009102439934A 2009-12-30 2009-12-30 Method for retrieving cross-language knowledge Pending CN102117284A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN2009102439934A CN102117284A (en) 2009-12-30 2009-12-30 Method for retrieving cross-language knowledge

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN2009102439934A CN102117284A (en) 2009-12-30 2009-12-30 Method for retrieving cross-language knowledge

Publications (1)

Publication Number Publication Date
CN102117284A true CN102117284A (en) 2011-07-06

Family

ID=44216058

Family Applications (1)

Application Number Title Priority Date Filing Date
CN2009102439934A Pending CN102117284A (en) 2009-12-30 2009-12-30 Method for retrieving cross-language knowledge

Country Status (1)

Country Link
CN (1) CN102117284A (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103294682A (en) * 2012-02-24 2013-09-11 摩根全球购物有限公司 Multi-language retrieving method, computer readable storage medium and network searching system
CN103678714A (en) * 2013-12-31 2014-03-26 北京百度网讯科技有限公司 Construction method and device for entity knowledge base
CN104573019A (en) * 2015-01-12 2015-04-29 百度在线网络技术(北京)有限公司 Information searching method and device
CN104850610A (en) * 2015-05-11 2015-08-19 均康(上海)信息科技有限公司 Network search engine system
CN106372187A (en) * 2016-08-31 2017-02-01 中译语通科技(北京)有限公司 Cross-language retrieval method oriented to big data

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1325513A (en) * 1998-09-09 2001-12-05 发明机器公司 Document semantic analysis/selection with knowledge creativity capability
CN101194253A (en) * 2005-06-14 2008-06-04 微软公司 Collocation translation from monolingual and available bilingual corpora
CN101510221A (en) * 2009-02-17 2009-08-19 北京大学 Enquiry statement analytical method and system for information retrieval

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1325513A (en) * 1998-09-09 2001-12-05 发明机器公司 Document semantic analysis/selection with knowledge creativity capability
CN101194253A (en) * 2005-06-14 2008-06-04 微软公司 Collocation translation from monolingual and available bilingual corpora
CN101510221A (en) * 2009-02-17 2009-08-19 北京大学 Enquiry statement analytical method and system for information retrieval

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103294682A (en) * 2012-02-24 2013-09-11 摩根全球购物有限公司 Multi-language retrieving method, computer readable storage medium and network searching system
CN103678714A (en) * 2013-12-31 2014-03-26 北京百度网讯科技有限公司 Construction method and device for entity knowledge base
CN104573019A (en) * 2015-01-12 2015-04-29 百度在线网络技术(北京)有限公司 Information searching method and device
CN104573019B (en) * 2015-01-12 2019-04-02 百度在线网络技术(北京)有限公司 Information retrieval method and device
CN104850610A (en) * 2015-05-11 2015-08-19 均康(上海)信息科技有限公司 Network search engine system
CN106372187A (en) * 2016-08-31 2017-02-01 中译语通科技(北京)有限公司 Cross-language retrieval method oriented to big data

Similar Documents

Publication Publication Date Title
Nie Cross-language information retrieval
Zhou et al. Translation techniques in cross-language information retrieval
CN101042692B (en) translation obtaining method and apparatus based on semantic forecast
US20060235689A1 (en) Question answering system, data search method, and computer program
Monz et al. Iterative translation disambiguation for cross-language information retrieval
Cheng et al. Creating multilingual translation lexicons with regional variations using web corpora
CN102117284A (en) Method for retrieving cross-language knowledge
Vilares et al. Managing misspelled queries in IR applications
Udupa et al. “They Are Out There, If You Know Where to Look”: Mining Transliterations of OOV Query Terms for Cross-Language Information Retrieval
Vilares et al. On the feasibility of character n-grams pseudo-translation for Cross-Language Information Retrieval tasks
Kim et al. Combining lexical and statistical translation evidence for cross‐language information retrieval
Chandra et al. Assessing query translation quality using back translation in hindi-english clir
Gupta et al. Advanced machine learning techniques in natural language processing for Indian languages
Wu et al. Learning to find English to Chinese transliterations on the web
Hiemstra et al. A domain specific lexicon acquisition tool for cross-language information retrieval
Zhang et al. Detection and translation of oov terms prior to query time
Lin et al. Query Expansion from Wikipedia and Topic Web Crawler on CLIR.
Moukdad et al. How do search engines handle Chinese queries
He et al. Cross‐Language Information Retrieval
Carpuat A semantic evaluation of machine translation lexical choice
Hsu et al. Query Expansion via Link Analysis of Wikipedia for CLIR.
Kishida Prediction of performance of cross-language information retrieval using automatic evaluation of translation
Sakamoto et al. Utilization of Multi-word Expressions to Improve Statistical Machine Translation of Statutory Sentences
Hu et al. Mining Translations of Web Queries from Web Click-through Data.
Lin et al. Exploring the effectiveness of Chinese-to-English machine translation for CLIR applications in earthquake engineering

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C53 Correction of patent of invention or patent application
CB02 Change of applicant information

Address after: 100026 Beijing city Chaoyang District West Road No. 1 A Winterless center block 5A

Applicant after: PERA CORPORATION LTD.

Address before: 100026 Beijing city Chaoyang District West Road No. 1 A Winterless center block 5A

Applicant before: PERA Global Technology (Beijing) Co., Ltd.

COR Change of bibliographic data

Free format text: CORRECT: APPLICANT; FROM: PERA GLOBAL TECHNOLOGY (BEIJING) CO., LTD. TO: PERA GLOBAL TECHNOLOGY CO., LTD.

C12 Rejection of a patent application after its publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20110706