CN1687925A - Method for realizing bilingual web page searching - Google Patents

Method for realizing bilingual web page searching Download PDF

Info

Publication number
CN1687925A
CN1687925A CN 200510018672 CN200510018672A CN1687925A CN 1687925 A CN1687925 A CN 1687925A CN 200510018672 CN200510018672 CN 200510018672 CN 200510018672 A CN200510018672 A CN 200510018672A CN 1687925 A CN1687925 A CN 1687925A
Authority
CN
China
Prior art keywords
bilingual
translation
search
retrieval type
query requests
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN 200510018672
Other languages
Chinese (zh)
Inventor
贺方升
陈智贤
余俊
程伟
朱前线
孙上海
李银刚
朱柳嵩
王沧洪
Original Assignee
贺方升
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 贺方升 filed Critical 贺方升
Priority to CN 200510018672 priority Critical patent/CN1687925A/en
Publication of CN1687925A publication Critical patent/CN1687925A/en
Pending legal-status Critical Current

Links

Landscapes

  • Machine Translation (AREA)

Abstract

The invention relates to a method for implementing bilingual webpages by search engine and automatically generated bilingual searching mode, so as to provide auxiliary translation service for users and characterized in including the steps below: (1) a user putting in an inquiry request; (2) according to the inquiry request, generating the bilingual searching mode; (3) transmitting the bilingual searching mode to search engine for searching; (4) the search engine returns the searched results. The advantage of the invention lies in that as long as an single-language inquiry request is put in on a user operation and display interface, it can automatically generate a bilingual searching mode, thus able to search the bilingual webpages that contain the inquiry request and may contain the corresponding translation of the inquiry request. The invention can provide accurate translation information for users.

Description

A kind of method that realizes bilingual web page searching
Technical field
The present invention relates to the searching method of the multilingual webpage in a kind of internet; Specifically, relate to a kind of bilingual retrieval type that utilizes search engine and generate automatically and realize the search bilingual web page, the method for supplementary translation is provided for the user.
Background technology
Along with the reinforcement of Internet development and intercultural communication, the website also begins to have occurred international trend.At first, increasing English study website and bilingual parallel website have occurred, the education resource of a lot of bilingual journals has been arranged in these webpages; In addition in a lot of webpages, for some proper nouns or the vocabulary that occurs recently, some author can enclose their corresponding translator of English behind this class vocabulary, as on the school's brief introduction webpage on the Wuhan University of Technology website, " Wuhan University of Technology (Wuhan University ofTechnology) is that the Ministry of Education is directly under national key university " so bilingual description is just arranged.These webpages that contain bilingual journal translation have in the internet a lot, are the precious resources that people can reference when translating.
In order to search the translation achievement that these have existed on the internet, people can be by means of some general search engines, as www.google.com and Baidu ( Www.baidu.com), can search the webpage that there is the bilingual journal translation in these by selecting and making up the keyword retrieval item.For instance, the user can import bilingual retrieval type " university of Wuhan University of Technology " in the search box of www.baidu.com, can search much the webpage that includes " Wuhan University of Technology " and " university " simultaneously, in these Search Results, " Wuhan University of Technology (Wuhan University of Technology) is that the Ministry of Education is directly under national key university ... " so bilingual description can appear, obviously, these Search Results are helpful to user's translation " Wuhan University of Technology " such proper noun.But problem is existing these universal search engines is not specially for translation search designs, and user's search level and English level all had higher requirements.
Summary of the invention
The objective of the invention is a kind of method that realizes bilingual web page searching of proposing at above-mentioned inconvenience, this method can be according to user's input content, automatically generate corresponding bilingual retrieval type, send into search engine searches again, for user search to the webpage that contains user input content and its appropriate translation thereof simultaneously, for user's translation provides reference.
In order to achieve the above object, technical scheme provided by the invention is: may further comprise the steps,
(1) submit queries request;
(2) generate bilingual retrieval type according to the query requests content;
(3) send bilingual retrieval type to search engine search;
(4) search engine returns Search Results.
And described step 2 realizes that by bilingual retrieval type automatic creation system performing step is as follows:
A) after bilingual retrieval type automatic creation system receives query requests, query requests is cut into key word item, from key word item, selects the search key item automatically, the key word item that deletion is redundant;
B) the search key item is translated respectively;
C) translation and the query requests content with the search key item is combined into bilingual retrieval type according to logical relation.
And, carrying out step C) afterwards, submit the bilingual retrieval type of automatic generation to the user, accept the user and confirm or revise, carry out step 3 then.
After adopting above technical scheme, the present invention has the following advantages and good effect: the user is as long as submit a single language inquiry request at user's operation and display interface, just bilingual retrieval type can be generated automatically, thereby the bilingual web page that contains this query requests simultaneously and may contain this query requests its appropriate translation can be retrieved.Reference to a certain extent can be provided for user's translation, and particularly can only " search " translation of the proper noun of " translation " voluntarily to some has great reference value.
Description of drawings
Fig. 1 is a process flow diagram of the present invention.
Fig. 2 is a system schematic of the present invention.
Embodiment
Below in conjunction with accompanying drawing 1, Fig. 2, the present invention is described in further detail.Set forth for convenient, the invention provides one is specific embodiment of example with Chinese, English, and in fact the present invention is equally applicable to the bilingual search between other Languages, as Chinese and French, Japanese and English etc.
A kind of method of bilingual web page searching that realizes may further comprise the steps: (1) submit queries request; (2) generate bilingual retrieval type according to the query requests content; (3) send bilingual retrieval type to search engine search; (4) search engine returns Search Results.
In step 1, the submit queries request realizes that by user's operation and display interface user's operation can be browser or client software with display interface, and user's input interface is arranged, can receive user's query requests, and query requests is sent by network.The user operates the prior art that belongs to the computer software aspect of writing with display interface.
Example: suppose that the user will inquire about the translator of English of " Wuhan University of Technology " by this method, then submit query requests to: " Wuhan University of Technology " at user's operation and display interface.
In step 2, utilize bilingual retrieval type automatic creation system to generate bilingual retrieval type according to the query requests content.In the specific implementation, bilingual retrieval type automatic creation system can be divided into cutting keyword module and these two modules of translation engine are write.
In step 3, described search engine is a universal search engine, as Www.google.comWith Www.baidu.comDeng, also can be META Search Engine.
In example, bilingual retrieval type automatic creation system sends bilingual retrieval type to search engine, as " the AND wuhan AND university of Wuhan University of Technology " sent to Www.google.com Www.google.comAfter receiving query requests, be that search terms is retrieved its database with bilingual retrieval type.
In step 4, search engine returns Search Results, and in user operation and display interface display result.
In example, Www.google.comWith " the AND wuhan AND university of Wuhan University of Technology " is that search terms is retrieved its database, returns 585 Search Results, promptly has 585 webpages to contain " Wuhan University of Technology ", " wuhan ", " university " simultaneously.In preceding 10 results, have 5 results to contain " Wuhan University of Technology (Wuhan University of Technology) is that the Ministry of Education is directly under national key university " so bilingual description, obviously these Search Results are helpful to user's translation " Wuhan University of Technology ".
For bilingual efficiently retrieval type is provided, it is as follows to the invention provides the step that realizes the bilingual retrieval type of generation:
A) after bilingual retrieval type automatic creation system receives query requests, query requests is cut into key word item, from key word item, selects the search key item automatically, the key word item that deletion is redundant; B) the search key item is translated respectively; C) translation and the query requests content with the search key item is combined into bilingual retrieval type according to logical relation.The function of its workflow and each module specifies as follows:
In steps A) in after bilingual retrieval type automatic creation system receives query requests, cutting keyword module is cut into a plurality of keywords with query requests, if keyword is more, can select the search key item automatically according to the part of speech and the word frequency deletion keyword of keyword., different for Chinese, Korean, Japanese etc. with the Romance language that with English is representative for for the department of oriental languages language of representative, there is not tangible delimiter between speech in the sentence of these language and speech.With Chinese is example, and Chinese is some continuous word sequences, it need be cut into a plurality of keywords, just can be used for query and search.Cutting keyword module is exactly at Chinese word segmentation and part-of-speech tagging in the Chinese Query processing of request, and the speech in the long-distance call is identified, and long-distance call is cut into a plurality of independent substantially semantic units---Chinese word, and carries out part-of-speech tagging.Chinese word segmentation and part-of-speech tagging research are the focuses of Chinese information processing research always, all have both at home and abroad than proven technique and product: on product, domestic numerous units all are ripe products as " Chinese word segmentation software package ", " the participle marking program " of Xiamen University language technique center of magnanimity scientific and technical information technology company limited, " the Chinese cutting and the marking program " of computational linguistics research institute of Peking University etc.; On technology and algorithm, maximum matching method is arranged, oppositely maximum matching method, by speech traversal method, set up syncopation, optimum matching method, finite multi-level enumeration method, the rescan method, in abutting connection with leash law, in abutting connection with knowledge constraints method, expert system approach, the system of selection of minimum participle word frequency, neural net method or the like.
For being for the Romance language of representative with English, in English, between the word with the space as natural delimiter, the problem of participle and part-of-speech tagging is little, but has English lexical analysis problem: i.e. vocabulary form reduction, search the processing of phrase (phrase) and special symbol.Have only the vocabulary that will have metamorphosis to be reduced into the original shape form, just can be beneficial to the translation of translation system; Have only and determined which adjacent speech constitutes a phrase, and be that unit translates, just can obtain corresponding correct translation with the phrase.Cutting keyword module is being carried out participle, part-of-speech tagging and lexical analysis exactly in the processing of English query requests; English is being carried out in the process of participle,, also having several special symbols to handle: the capital and small letter of numeral, hyphen, punctuation mark and letter etc. except the space-separated symbol.English participle, part-of-speech tagging and lexical analysis have obtained remarkable progress at present, and technology maturation is built-in with participle, part-of-speech tagging and lexical analysis tool as popular at home English-Chinese mechanical translation software, as softwares such as Kingsoft FastAIT, Dongfangkuaiche.
In example: bilingual retrieval type automatic creation system receives user's query requests: after " Wuhan University of Technology ", cutting crux speech module is a plurality of crux speech with its cutting, and a kind of cutting result is: " Wuhan " (noun), " science and engineering " (noun), " university " (noun).
And for example: bilingual retrieval type automatic creation system receives user's query requests: after " every people who holds firmly to the truth ", cutting crux speech module is a plurality of crux lexical items with its cutting, a kind of cutting result is: " every " (adverbial word), " adhering to " (verb), " truth " (noun), " " (auxiliary word), " people " (noun), because key word item is more, can be according to the part of speech deletion key word item of key word item, as remove adverbial word, auxiliary word and verb, stay the search key item: " truth " (noun), " people " (noun).
At step B) in, the keyword that translation engine generates after with cutting is translated respectively, translates into a plurality of keywords with another kind of language description.This class translation engine belongs to very ripe prior art, such as Kingsoft Powerword, east grand ceremony, Dr.eye electronic dictionary softwares such as " it are logical to translate allusion quotation ".
In example: key word item " Wuhan ", " science and engineering ", " university " that translation engine generates after to cutting translate respectively, " Wuhan " is translated into and is " wuhan ", " science and engineering " do not have corresponding with it translation, and " university " has two translations corresponding with it: " college " and " university ".
At step C) in, bilingual retrieval type automatic creation system is combined into bilingual retrieval type together with the translation of search key item and original query requests.
A query requests content may include a plurality of search key items and (be designated as keyword a, keyword b, keyword c respectively ... keyword x), same retrieval crux lexical item has multiple translation and (is designated as translation 1, translation 2, translation 3 respectively ... translation n), the translation of keyword A is designated as translation a1, translation a2 ... translation an, the rest may be inferred for other.Then the bilingual retrieval type of Sheng Chenging can have multiple array mode, can logical OR between these keyword inside or the composition of relations of logical and get up, and also can logical OR between these keywords and the translation or the composition of relations of logical and get up, also can logical OR between query requests, keyword, the translation or the composition of relations of logical and get up, as:
Query requests+(translation a1 OR translation a2 OR translation a3 ... )+(translation b1 OR translation b2 OR translation b3 ... )+... + (translation x1 OR translation x2 OR translation x3 ...);
Query requests+(translation a1 AND translation a2 AND translation a3 ... )+(translation b1 AND translation b2 AND translation b3 ... )+... + (translation x1 AND translation x2 AND translation x3 ...);
(keyword a AND keyword b AND keyword c ... )+(translation a1 AND translation a2 AND translation a3 ... )+(translation b1 AND translation b2 AND translation b3 ... )+... + (translation x1 AND translation x2 AND translation x3 ...);
(keyword a AND keyword b AND keyword c ... )+(translation a1 OR translation a2 OR translation a3 ... )+(translation b1 OR translation b2 OR translation b3 ... )+... + (translation x1 OR translation x2 OR translation x3 ...);
Query requests+(translation a1 OR translation a2 OR translation a3 ...) OR (translation b1 OR translation b2 OR translation b3 ...) OR ... OR (translation x1 OR translation x2 OR translation x3 ...);
(keyword a AND keyword b AND keyword c ...) OR (translation a1 AND translation a2 AND translation a3 ...) OR (translation b1 AND translation b2 AND translation b3 ...) OR ... OR (translation x1 AND translation x2 AND translation x3 ...).
"+" and " AND " in the above array mode all presentation logic and relation, " OR " presentation logic or relation, bracket " () " adds for ease of description and there is no practical significance.
During concrete enforcement, bilingual retrieval type automatic creation system can select above-mentioned any to generate bilingual retrieval type automatically.
In example, query requests " Wuhan University of Technology " after bilingual retrieval type automatic creation system is handled, the bilingual retrieval type of generation can for, example is as follows:
(Wuhan University of Technology) AND (wuhan) AND (university OR college)
For the result of hommization is provided, after generating bilingual retrieval type automatically, return this bilingual retrieval type to user's operation and display interface, confirm that for the user user can need make amendment to the bilingual retrieval type of automatic generation according to oneself.
In example, after the user saw above-mentioned bilingual retrieval type, the user may think that this translation of college is unaccommodated, this keyword of college can be deleted from bilingual retrieval type.Bilingual retrieval type becomes: (Wuhan University of Technology) AND (wuhan) AND (university), this search terms bracket " () " add for ease of description and there is no practical significance.
In the specific implementation, can also be designed to step 4, after the user obtains Search Results, more bilingual retrieval type made amendment, carry out step 3, step 4 then again, can revise repeatedly, to obtain the most satisfied result of user.

Claims (3)

1. method that realizes bilingual web page searching is characterized in that: comprises the steps,
(1) submit queries request;
(2) generate bilingual retrieval type according to the query requests content;
(3) send bilingual retrieval type to search engine search;
(4) search engine returns Search Results.
2. as claims 1 described a kind of method that realizes bilingual web page searching, it is characterized in that: described step (2) realizes that by bilingual retrieval type automatic creation system performing step is as follows,
A) after bilingual retrieval type automatic creation system receives query requests, query requests is cut into key word item, from key word item, selects the search key item automatically, the key word item that deletion is redundant;
B) the search key item is translated respectively;
C) translation with the search key item is combined into into bilingual retrieval type with the query requests content according to logical relation.
3. as claims 2 described a kind of methods that realize bilingual web page searching, it is characterized in that: carrying out step C) afterwards, submit the bilingual retrieval type of automatic generation to the user, accept the user and confirm or revise, carry out step (3) then.
CN 200510018672 2005-05-10 2005-05-10 Method for realizing bilingual web page searching Pending CN1687925A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN 200510018672 CN1687925A (en) 2005-05-10 2005-05-10 Method for realizing bilingual web page searching

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN 200510018672 CN1687925A (en) 2005-05-10 2005-05-10 Method for realizing bilingual web page searching

Publications (1)

Publication Number Publication Date
CN1687925A true CN1687925A (en) 2005-10-26

Family

ID=35305966

Family Applications (1)

Application Number Title Priority Date Filing Date
CN 200510018672 Pending CN1687925A (en) 2005-05-10 2005-05-10 Method for realizing bilingual web page searching

Country Status (1)

Country Link
CN (1) CN1687925A (en)

Cited By (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101042692B (en) * 2006-03-24 2010-09-22 富士通株式会社 translation obtaining method and apparatus based on semantic forecast
CN102253930A (en) * 2010-05-18 2011-11-23 腾讯科技(深圳)有限公司 Method and device for translating text
CN103020044A (en) * 2012-12-03 2013-04-03 江苏乐买到网络科技有限公司 Machine-aided webpage translation method and system thereof
CN103324680A (en) * 2012-06-01 2013-09-25 微软公司 Language learning opportunities and general search engine
CN103885943A (en) * 2012-12-19 2014-06-25 北大方正集团有限公司 Method for achieving drop-down list box control in webpage
CN104850610A (en) * 2015-05-11 2015-08-19 均康(上海)信息科技有限公司 Network search engine system
CN105022728A (en) * 2015-07-13 2015-11-04 广西达译商务服务有限责任公司 Automatic acquisition system of Chinese and Lao bilingual parallel texts and implementation method
CN106294643A (en) * 2016-08-03 2017-01-04 王晓光 Different language realizes real-time searching method and system in big data
CN106326350A (en) * 2016-08-06 2017-01-11 马岩 Method and system for realizing real-time search of different languages in big data
CN106503231A (en) * 2016-10-31 2017-03-15 北京百度网讯科技有限公司 Searching method and device based on artificial intelligence
WO2018023483A1 (en) * 2016-08-03 2018-02-08 王晓光 Method and system for implementing real-time search of different languages in big data
WO2018027344A1 (en) * 2016-08-06 2018-02-15 马岩 Method and system for implementing real-time search of different languages in big data
CN108900574A (en) * 2018-06-04 2018-11-27 上海市疾病预防控制中心 One-stop search method for pushing based on users ' individualized requirement

Cited By (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101042692B (en) * 2006-03-24 2010-09-22 富士通株式会社 translation obtaining method and apparatus based on semantic forecast
CN102253930B (en) * 2010-05-18 2016-03-23 腾讯科技(深圳)有限公司 A kind of method of text translation and device
CN102253930A (en) * 2010-05-18 2011-11-23 腾讯科技(深圳)有限公司 Method and device for translating text
CN103324680A (en) * 2012-06-01 2013-09-25 微软公司 Language learning opportunities and general search engine
CN103020044A (en) * 2012-12-03 2013-04-03 江苏乐买到网络科技有限公司 Machine-aided webpage translation method and system thereof
CN103885943A (en) * 2012-12-19 2014-06-25 北大方正集团有限公司 Method for achieving drop-down list box control in webpage
CN104850610A (en) * 2015-05-11 2015-08-19 均康(上海)信息科技有限公司 Network search engine system
CN105022728A (en) * 2015-07-13 2015-11-04 广西达译商务服务有限责任公司 Automatic acquisition system of Chinese and Lao bilingual parallel texts and implementation method
CN106294643A (en) * 2016-08-03 2017-01-04 王晓光 Different language realizes real-time searching method and system in big data
WO2018023483A1 (en) * 2016-08-03 2018-02-08 王晓光 Method and system for implementing real-time search of different languages in big data
CN106326350A (en) * 2016-08-06 2017-01-11 马岩 Method and system for realizing real-time search of different languages in big data
WO2018027344A1 (en) * 2016-08-06 2018-02-15 马岩 Method and system for implementing real-time search of different languages in big data
CN106503231A (en) * 2016-10-31 2017-03-15 北京百度网讯科技有限公司 Searching method and device based on artificial intelligence
CN106503231B (en) * 2016-10-31 2020-02-04 北京百度网讯科技有限公司 Search method and device based on artificial intelligence
CN108900574A (en) * 2018-06-04 2018-11-27 上海市疾病预防控制中心 One-stop search method for pushing based on users ' individualized requirement

Similar Documents

Publication Publication Date Title
CN1687925A (en) Method for realizing bilingual web page searching
JP5203934B2 (en) Propose and refine user input based on original user input
US9069857B2 (en) Per-document index for semantic searching
CN1685341B (en) Blinking annotation callouts highlighting cross language search results
Hertling et al. WikiMatch-using Wikipedia for ontology matching.
US20090106203A1 (en) Method and apparatus for a web search engine generating summary-style search results
US8280721B2 (en) Efficiently representing word sense probabilities
CN1839386A (en) Internet searching using semantic disambiguation and expansion
US20170109449A1 (en) Discovery engine
JP2012248210A (en) System and method for retrieving content of complicated language such as japanese
CN1815477A (en) Method and system for providing semantic subjects based on mark language
CN1282934A (en) Mehtod and system of similar letter selection and document retrieval
CN1904886A (en) Method and apparatus for establishing link structure between multiple documents
Kishida et al. Overview of CLIR task at the fifth NTCIR workshop
CN1750002A (en) Method for providing research result
Liu et al. Information retrieval and Web search
Kishida et al. Overview of CLIR task at the fourth NTCIR workshop
Bian et al. Integrating query translation and document translation in a cross-language information retrieval system
JP2003150623A (en) Language crossing type patent document retrieval method
CN1114165C (en) Segmentation of Chinese text into words
Nasharuddin et al. Cross-lingual information retrieval
Vidya et al. Web Page Ranking Using Multilingual Information Search Algorithm-A Novel Approach
KR100885527B1 (en) Apparatus for making index-data based by context and for searching based by context and method thereof
Hsu et al. Query Expansion via Link Analysis of Wikipedia for CLIR.
Bhaskar et al. Cross lingual query dependent snippet generation

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C57 Notification of unclear or unknown address
DD01 Delivery of document by public notice

Addressee: He Fangsheng

Document name: Notification before expiration of term

C57 Notification of unclear or unknown address
DD01 Delivery of document by public notice

Addressee: He Fangsheng

Document name: Notification that Application Deemed to be Withdrawn

C02 Deemed withdrawal of patent application after publication (patent law 2001)
WD01 Invention patent application deemed withdrawn after publication