CN101196898A - Method for applying phrase index technology into internet search engine - Google Patents

Method for applying phrase index technology into internet search engine Download PDF

Info

Publication number
CN101196898A
CN101196898A CNA2007101430242A CN200710143024A CN101196898A CN 101196898 A CN101196898 A CN 101196898A CN A2007101430242 A CNA2007101430242 A CN A2007101430242A CN 200710143024 A CN200710143024 A CN 200710143024A CN 101196898 A CN101196898 A CN 101196898A
Authority
CN
China
Prior art keywords
phrase
search engine
index
web page
word
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CNA2007101430242A
Other languages
Chinese (zh)
Inventor
邓剑波
戴云川
詹天荣
张潘
高潮
周波
张森
胡显如
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Xinbaili Shoe (shenzhen) Coltd
Original Assignee
Xinbaili Shoe (shenzhen) Coltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Xinbaili Shoe (shenzhen) Coltd filed Critical Xinbaili Shoe (shenzhen) Coltd
Priority to CNA2007101430242A priority Critical patent/CN101196898A/en
Publication of CN101196898A publication Critical patent/CN101196898A/en
Pending legal-status Critical Current

Links

Images

Abstract

The invention applies phrase index technology to Internet search engine, decomposes the sentences in page documents into words and expressions, adds a plurality of other phrases to compose index phrase set in front of and behind key words which are taken as head words and generates index documents of web contents with phrases as unit; extracts the content words in query information submitted by user through word segmentation procedure and performs reasonable and possible combination of the words to gain the phrase set for search; precisely matches the phrases in the phrase set for search with the phrases in the index document in turn to gain search results; the phrase emphasizes single words in the aspect of expressing semantics, which facilitates the search result embody the possible intention of query more precisely.

Description

The phrase index technology is applied in method in the internet search engine
Technical field
The present invention relates generally to the bottom gordian technique of internet search engine---the innovation of " text index " mode and for improving some innovations of the front-end processing that this technology needs.The present invention is during index that the theory with phrase index applies to internet search engine constitutes.Because phrase index can significantly improve retrieval of content and the correlativity of the meaning of one's words between the content of being retrieved, thereby a kind of intelligentized search method that provides for internet search engine also is provided.
Background technology
Internet search engine (hereinafter to be referred as search engine) is the instrument of search and webpage and website.The ultimate principle of present search engine is the collection procedure by website or webpage, from the internet, collect web page address and text thereof automatically, give index and searching system with the web page text of collecting then, by each speech in the computer scanning text, foundation is the inverted file of unit with the speech, frequency that the term that search program provides according to the user occurs in text and probability sort to the text that comprises these terms, export the ranking results of webpage and website at last.
There is following some serious defective in this searching method:
The first, useless (perhaps for term uncorrelated or correlativity is low) Search Results is too much.This is because probability that single term occurs in text and frequency and the incomplete value of representing the correlativity of term and content of text.
Second, the inquiry mode of this search engine comes down to based on keyword, very undesirable for the effect that the complete sentence of input is inquired about, the order that the Web page sequencing method of this frequency that according to keyword occurs or number of clicks one class is discharged, under the situation of input sentence inquiry, can not well embody the height of retrieval sentence and content of text correlativity, see Fig. 1.
Three, existing search engine blurs the keyword coupling of inquiry, help obtaining more results like this, but but cause a lot of useless results to be full of, even interfere with result's appearance position preferably, and these search engines are not done special processing to question sentence, effect is bad relatively, sees Fig. 2.
Four, the scope that a plurality of without limits keywords occur simultaneously in the existing search engine, all phrase generation work is not limited in current sentence, situation that the keyword that different sentences comprise is grouped together might appear, for example: we search for " present of Chinese Valentine's Day " the words, and the result who uses present popular search engine to draw is likely like this:
" ... it is different Valentine's Day with the west ... as also to have deep Chinese culture inside information ... just look at the present attack strategy that we prepare for you ... ", can not embody the integral body of query statement fully and look like, see Fig. 3.
Summary of the invention
The objective of the invention is to be the phrase index technology is applied in the search engine of internet, can avoid above-mentioned defective, make search engine have more hommization, thereby obtain to have more rational result (result who promptly conforms with user search intent is more forward).
The present invention is applied in method in the internet search engine with the phrase index technology, comprises following step:
Step 1: accumulate info web automatically:
At first, by the collecting web page program, automatically, obtain original web page texts a large amount of on the internet by super link analysis; When obtaining these texts, by the participle program this web page text is resolved into one by one independently word, with word frequency statistics program statistics word frequency, the speech that word frequency is surpassed threshold value is labeled as keyword; Then, utilize the phrase generator program, with each keyword is the center, before it and after additional respectively some other speech, be combined into and comprise the word quantity different a series of phrases that do not wait, arrange in pairs or groups, obtain being used for the phrase collection of index, and press the ordering of phrase length, together with the information in these phrase sources, insert or be updated in the index file of inverted list;
The Query Information of step 2, process user:
At first, accept user's Query Information by the user interface of search engine; By the participle program user's Query Information is resolved into one by one independently word, and identify the part of speech of each speech; Utilize the phrase generator program again, the word that participle is obtained is the center with each notional word, before it and affix some other speech of back, is combined into and comprises the word quantity different a series of phrases that do not wait, arrange in pairs or groups, obtain the phrase collection that is used to retrieve, and be stored in the internal memory.
Step 3, match retrieval also obtain Search Results:
At first, the phrase that the phrase that will be used for retrieving by matcher is concentrated accurately mates with the phrase of the index file of inverted list successively, item that obtains mating, and the webpage id that it is corresponding detects, and it is concentrated to be stored in results web page successively; Then, to merging of corresponding same web page id in the result set,, take out corresponding web page interlinkage and other relevant information, thereby obtain Search Results according to the precedence traversing result webpage collection identical with depositing the results web page collection in.
When search engine analyzing web page text, is the web page text cutting plurality of keywords, and these speech are arranged as the form of word combination, with its with " phrase-webpage id1; webpage id2 ... " the form of such inverted list records in the index file in disk or the internal memory.
When search engine analyzing web page text, all phrase generation work all is limited in current sentence, and the keyword that comprises in the different sentences can not be grouped together.
When the Query Information of search engine process user, utilize the question sentence pattern matcher, convert the question sentence of user's Query Information to declarative sentence.
The phrase collection that will be used for index and retrieval is pressed phrase length ordering, long preceding weak point after.
In the process of the phrase collection that obtains index and retrieval, remove some insignificant combinations, described insignificant combination is meant the function word combination.
The present invention set up on the canned data key element of index file different with traditional search engines, the index file of traditional search engines is similar to " keyword (keyword)-webpage id1; word frequency; webpage id2 ... " such form, and the present invention is extended to " keyword " this key element the form of word combination, with " phrase-webpage id1, webpage id2 ... " such form records in the index file in disk or the internal memory.
The present invention is when specifically setting up index, make the operation of word combination be confined in the sentence, with each keyword is centre word, before it, form phrases with some other speech of back affix, and index sorted by phrase length, phrase is long comes the front, short after, all phrase generation work all is limited in current sentence, the situation that the keyword that different sentences comprise is grouped together do not occur.Because search engine has been limited the scope that a plurality of keywords occur simultaneously, can avoid original two irrelevant words, even the content strings of two sections words is to together, thereby produces too many garbage.
When submit queries information, the present invention utilizes the participle program to extract notional word (according to part-of-speech tagging) in the inquiry, these speech are carried out all reasonable and possible combinations: in all combinations, remove irrational combination, obtain a series of phrases thus to be used for retrieval, these phrases of search engine then, accurately mate wherein long phrase earlier, promptly with the long phrase that generates in the inquiry, remove to mate corresponding phrase in the index that web page text generates after treatment, what retrieve like this is exactly to comprise the many web page texts of keyword in the inquiry.
The present invention is when the Query Information that the search engine process user is submitted to, utilize the question sentence pattern matcher earlier, convert the question sentence of user inquiring information to declarative sentence by some simple pattern match, give the step process of back again, make the inquiry of question sentence formula to explain and to handle with form near the declarative sentence of original meaning, make that under prerequisite that can be in order to be effective inquiry is easier to handle.
Because the present invention is based upon on the phrase index basis, so the present invention is not accurate coupling to query contents, but having been carried out handling the back a plurality of phrases of generation, the inquiry sentence just is used for retrieval, differ although it is so and obtain surely and to inquire about sentence identical or comprise the web page contents of whole searching keywords, but such benefit is to generate the scope that a plurality of phrases can enlarge the search meaning of one's words, obtain more alternative webpage, utilize the phrase and the index that generate accurately to mate simultaneously again, dwindle meaning of one's words scope conversely, obtained meaning of one's words matching result more accurately.Compare keyword fuzzy matching with traditional search engines, since the phrase of keyword combination aspect expressing the meaning of one's words, obviously be better than single keyword (unit of expressing the integrated degree of the meaning of one's words in the natural language is from big to small successively: sentence〉phrase〉word individual character, it is unsatisfactory to use machine directly to handle sentence at present), the result of feasible search more accurate embodiment inquiry may be intended to.
Description of drawings
The results web page that Fig. 1 has search engine inquiry " the western university of China " to obtain for habit;
Fig. 2 for practise have search engine inquiry " whom the director of Memoirs of a Geisha is? " the results web page that obtains;
The results web page that Fig. 3 has search engine inquiry " present of Chinese Valentine's Day " to obtain for habit;
Fig. 4 accumulates the FB(flow block) of info web automatically for the present invention;
Fig. 5 for the present invention inquiry " whom the director of Memoirs of a Geisha is? " the results web page that is obtained;
The results web page that Fig. 6 is obtained for the present invention's inquiry " present of Chinese Valentine's Day ";
Embodiment
The present invention mainly realizes by following steps:
Step 1: accumulate info web automatically, see Fig. 4:
At first, by the collecting web page program,, automatically, obtain original web page texts a large amount of on the internet by super link analysis as reptile (crawler) or spider (spider) program; When obtaining these texts, by the participle program this web page text is resolved into one by one independently word, with word frequency statistics program statistics word frequency, the speech that word frequency is surpassed threshold value is labeled as keyword; Then, utilize the phrase generator program, with each keyword is the center, before it and after additional respectively some other speech, be combined into and comprise word quantity and do not wait, the different a series of phrases of arranging in pairs or groups, in the process of combination,, remove some insignificant combinations according to the part of speech of participle program mark, for example some function word combinations etc., obtain being used for the phrase collection of index, and press phrase length ordering, long preceding weak point after, information together with these phrase sources, by " phrase-webpage id1, webpage id2 ... " the form of corresponding tables insert or be updated in the index file of inverted list of disk or internal memory.
The Query Information of step 2, process user:
At first, accept user's Query Information by the user interface of search engine; By the participle program user's Query Information is resolved into one by one independently word (generally for English, this step of participle can simply divide by the space), simultaneously the part of speech of each speech being identified out in the process of participle, also is by the participle routine processes; Utilize the phrase generator program again, the word that participle is obtained is the center with each notional word, before it and affix some other speech of back, is combined into and comprises the word quantity different a series of phrases that do not wait, arrange in pairs or groups, in the process of combination, according to the part of speech of participle program mark, remove some insignificant combinations, for example some function word combinations etc., obtain the phrase collection that is used to retrieve, and press phrase length ordering, long preceding weak point after, be stored in the internal memory.
Step 3, match retrieval also obtain Search Results:
At first, the phrase that the phrase that will be used for retrieving by matcher is concentrated successively with the index file of the inverted list of disk or internal memory in phrase accurately mate, item that obtains mating, the webpage id that it is corresponding detects, it is concentrated to be stored in results web page successively; Then, to merging of corresponding same web page id in the result set, phrase length promptly is integrated into these in item that coupling obtains at first by the length gauge that the longest phrase wherein occurs; By according to the precedence identical (preferentially mating the long word group) traversing result webpage collection with depositing the results web page collection in, take out corresponding web page interlinkage and other relevant information, feed back to the user interface of search engine, Search Results is presented to the user.
The concrete structure of described inverted index table is:
The structure of inverted index table is the corresponding webpage id tabulation of each phrase, and storage mode can use flexibly, for example can write down a pointer that points to corresponding webpage id tabulation behind each phrase, realizes with such binary prescription formula.
The phrase character string Webpage id tabulation
Or
Figure A20071014302400081
The present invention has carried out some simple pre-service to the question sentence in the user inquiring information, has converted question sentence to the statement word order, and then has searched for, and at limited interrogative, this is easy to accomplish.
For example:
Why is the sun round?
The sun is the reason of circle
The sun is the cause of circle
The sun is the cause of circle
......
How does this improve child's self-care ability?
Improve child's self-care ability method
Improve child's self-care ability skill
Improve child's self-care ability points for attention
......
Search engine inquiry of the present invention " whom the director of Memoirs of a Geisha is? " the results web page that is obtained is seen Fig. 5, compares with Fig. 2, and resulting resultant content relatively meets search purposes.
Legal phrase form has at present:
1. noun+noun
2. adjective+noun
3. noun+verb
4. adverbial word+verb
5. verb+noun
6. the combination in any of above form phrase
This rule can be expanded or be deleted, can do further refinement and consider when specific implementation.
Generate the example of phrase mode:
Such a word is arranged in the webpage of supposing analyzing:
" mascot of 2008 Beijing Olympic Games formally make known ceremony "
The result of participle and part-of-speech tagging:
Beijing/n/t in 2008 Olympic Games/n /l mascot/n is formal/and a makes known/v ceremony/n
The keyword of supposing the webpage that is obtained analyzed is " Olympic Games " (can obtain by word frequency analysis), and the phrase that utilizes method of the present invention to reconfigure out so is:
2008 Olympic Games (with speech before the keyword and keyword combination)
2008 Beijing Olympic Games (with speech before the keyword and keyword combination)
Beijing Olympic Games (with speech before the keyword and keyword combination)
The mascot of the Olympic Games (with speech behind the keyword and keyword combination)
The mascot of the Olympic Games is formally made known (with the combination of the speech behind the keyword and keyword)
The mascot of the Olympic Games is formally made known ceremony (with the combination of the speech behind the keyword and keyword)
Olympic Mascot is made known (with the combination of the speech behind the keyword and keyword)
Olympic Mascot ceremony (with speech behind the keyword and keyword combination)
Can not occur: the phrase of " Olympic Games of mascot " class reversed order.
This webpage will be used as index with some such phrases, and the present invention simultaneously generates some such phrases also the Query Information of user's input through participle and analysis; Use these phrases to come match search then.Because a sentence can generate a lot of phrases, so retrieval of the present invention can be carried out repeatedly (also can concurrently carrying out), but retrieval each time all is accurate coupling.
In addition, why will limit keyword within one when generating phrase, be for fear of two words, even the original irrelevant content strings of two sections words is to together.To illustrate below:
For example: we search for " present of Chinese Valentine's Day " the words, and the result who uses present popular search engine to draw is likely like this:
" ... it is different Valentine's Day with the west ... as also to have deep Chinese culture inside information ... just look at the present attack strategy that we prepare for you ... ", can not embody the integral body of query statement fully and look like.
On the contrary, use the present invention to search for the words, the result who draws generally can be like this: " ... China's Valentine's Day creative present ... China's tradition ... choose present ... ", greatly increase with the correlativity of the original meaning of query statement, and the result that correlativity is good more occurs forwardly more, sees Fig. 6.

Claims (7)

1. the phrase index technology is applied in the method in the internet search engine, it is characterized in that comprising following step:
Step 1: accumulate info web automatically:
At first, by the collecting web page program, automatically, obtain original web page texts a large amount of on the internet by super link analysis; When obtaining these texts, by the participle program this web page text is resolved into one by one independently word, with word frequency statistics program statistics word frequency, the speech that word frequency is surpassed threshold value is labeled as keyword; Then, utilize the phrase generator program, with each keyword is the center, before it and after additional respectively some other speech, be combined into and comprise the word quantity different a series of phrases that do not wait, arrange in pairs or groups, obtain being used for the phrase collection of index, and press the ordering of phrase length, together with the information in these phrase sources, insert or be updated in the index file of inverted list;
The Query Information of step 2, process user:
At first, accept user's Query Information by the user interface of search engine; By the participle program user's Query Information is resolved into one by one independently word, and identify the part of speech of each speech; Utilize the phrase generator program again, the word that participle is obtained is the center with each notional word, before it and affix some other speech of back, is combined into and comprises the word quantity different a series of phrases that do not wait, arrange in pairs or groups, obtain the phrase collection that is used to retrieve, and be stored in the internal memory.
Step 3, match retrieval also obtain Search Results:
At first, the phrase that the phrase that will be used for retrieving by matcher is concentrated accurately mates with the phrase of the index file of inverted list successively, item that obtains mating, and the webpage id that it is corresponding detects, and it is concentrated to be stored in results web page successively; Then, to merging of corresponding same web page id in the result set,, take out corresponding web page interlinkage and other relevant information, thereby obtain Search Results according to the precedence traversing result webpage collection identical with depositing the results web page collection in.
2. as claimed in claim 1 the phrase index technology is applied in method in the internet search engine, it is characterized in that: when search engine analyzing web page text, is the web page text cutting plurality of keywords, and these speech are arranged as the form of word combination, with its with " phrase->webpage id1; webpage id2 ... " the form of such inverted list records in the index file in disk or the internal memory.
3. as claimed in claim 1 the phrase index technology is applied in method in the internet search engine, it is characterized in that: when search engine analyzing web page text, all phrase generation work all is limited in current sentence, and the keyword that comprises in the different sentences can not be grouped together.
4. as claimed in claim 1 the phrase index technology is applied in method in the internet search engine, it is characterized in that: when the Query Information of search engine process user, utilize the question sentence pattern matcher, convert the question sentence of user's Query Information to declarative sentence.
5. as claimed in claim 1 the phrase index technology is applied in method in the internet search engine, it is characterized in that: will be used for the phrase collection of index and retrieval, press the ordering of phrase length, long preceding weak point after.
6. as claimed in claim 1 the phrase index technology is applied in method in the internet search engine, it is characterized in that: in the process of the phrase collection that obtains index and retrieval, remove some insignificant combinations.
7. as claimed in claim 6 the phrase index technology is applied in method in the internet search engine, it is characterized in that: described insignificant combination is meant the function word combination.
CNA2007101430242A 2007-08-21 2007-08-21 Method for applying phrase index technology into internet search engine Pending CN101196898A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CNA2007101430242A CN101196898A (en) 2007-08-21 2007-08-21 Method for applying phrase index technology into internet search engine

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CNA2007101430242A CN101196898A (en) 2007-08-21 2007-08-21 Method for applying phrase index technology into internet search engine

Publications (1)

Publication Number Publication Date
CN101196898A true CN101196898A (en) 2008-06-11

Family

ID=39547317

Family Applications (1)

Application Number Title Priority Date Filing Date
CNA2007101430242A Pending CN101196898A (en) 2007-08-21 2007-08-21 Method for applying phrase index technology into internet search engine

Country Status (1)

Country Link
CN (1) CN101196898A (en)

Cited By (27)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102236637A (en) * 2010-04-22 2011-11-09 北京金山软件有限公司 Method and system for determining collocation degree of collocations with central word
CN102262634A (en) * 2010-05-24 2011-11-30 北京大学深圳研究生院 Automatic questioning and answering method and system
CN102314418A (en) * 2011-10-09 2012-01-11 北京航空航天大学 Method for comparing Chinese similarity based on context relation
CN102339320A (en) * 2011-11-04 2012-02-01 成都市华为赛门铁克科技有限公司 Malicious web recognition method and device
CN102654866A (en) * 2011-03-02 2012-09-05 北京百度网讯科技有限公司 Method and device for establishing example sentence index and method and device for indexing example sentences
CN102023989B (en) * 2009-09-23 2012-10-10 阿里巴巴集团控股有限公司 Information retrieval method and system thereof
CN101923556B (en) * 2010-02-09 2013-01-02 上海莱希信息科技有限公司 Method and device for searching webpages according to sentence serial numbers
CN103049495A (en) * 2012-12-07 2013-04-17 百度在线网络技术(北京)有限公司 Method, device and equipment for providing searching advice corresponding to inquiring sequence
CN103123624A (en) * 2011-11-18 2013-05-29 阿里巴巴集团控股有限公司 Method of confirming head word, device of confirming head word, searching method and device
CN102081627B (en) * 2009-11-27 2014-09-17 北京金山办公软件有限公司 Method and system for determining contribution degree of word in text
CN104364814A (en) * 2012-10-30 2015-02-18 Sk普兰尼特有限公司 System and method for providing content recommendation service
CN104756046A (en) * 2012-10-17 2015-07-01 三星电子株式会社 User terminal device and control method thereof
CN105528441A (en) * 2015-12-22 2016-04-27 北京奇虎科技有限公司 Automatic marking based head word extracting method and device
CN106844638A (en) * 2017-01-19 2017-06-13 王碧波 Information retrieval method, device and electronic equipment
CN107145555A (en) * 2017-04-28 2017-09-08 北京安数云信息技术有限公司 A kind of fuzzy sentence searching method based on participle
CN107221328A (en) * 2017-05-25 2017-09-29 百度在线网络技术(北京)有限公司 The localization method and device in modification source, computer equipment and computer-readable recording medium
CN102760127B (en) * 2011-04-26 2017-11-03 北京百度网讯科技有限公司 Method, device and the equipment of resource type are determined based on expanded text information
CN107784123A (en) * 2017-11-06 2018-03-09 北京中科智营科技发展有限公司 A kind of chess game optimization method based on theme
CN108427759A (en) * 2018-03-19 2018-08-21 四川意高汇智科技有限公司 Real time data computational methods for mass data processing
CN108536676A (en) * 2018-03-28 2018-09-14 广州华多网络科技有限公司 Data processing method, device, electronic equipment and storage medium
CN108776705A (en) * 2018-06-12 2018-11-09 厦门市美亚柏科信息股份有限公司 A kind of method, apparatus, equipment and readable medium that text full text is accurately inquired
CN108984582A (en) * 2018-05-04 2018-12-11 中国信息安全研究院有限公司 A kind of inquiry request processing method
CN110555159A (en) * 2018-03-30 2019-12-10 北大方正集团有限公司 Webpage retrieval method, device, equipment and storage medium
CN111190993A (en) * 2019-12-26 2020-05-22 航天信息股份有限公司企业服务分公司 Hierarchical sorting method based on ordered set of keywords
CN111190948A (en) * 2019-12-26 2020-05-22 航天信息股份有限公司企业服务分公司 Retrieval coding method based on keyword sorting
CN111523302A (en) * 2020-07-06 2020-08-11 成都晓多科技有限公司 Syntax analysis method and device, storage medium and electronic equipment
CN111611489A (en) * 2020-05-22 2020-09-01 北京字节跳动网络技术有限公司 Search processing method and device, electronic equipment and storage medium

Cited By (41)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102023989B (en) * 2009-09-23 2012-10-10 阿里巴巴集团控股有限公司 Information retrieval method and system thereof
CN102081627B (en) * 2009-11-27 2014-09-17 北京金山办公软件有限公司 Method and system for determining contribution degree of word in text
CN101923556B (en) * 2010-02-09 2013-01-02 上海莱希信息科技有限公司 Method and device for searching webpages according to sentence serial numbers
CN102236637A (en) * 2010-04-22 2011-11-09 北京金山软件有限公司 Method and system for determining collocation degree of collocations with central word
CN102262634A (en) * 2010-05-24 2011-11-30 北京大学深圳研究生院 Automatic questioning and answering method and system
CN102262634B (en) * 2010-05-24 2013-05-29 北京大学深圳研究生院 Automatic questioning and answering method and system
CN102654866A (en) * 2011-03-02 2012-09-05 北京百度网讯科技有限公司 Method and device for establishing example sentence index and method and device for indexing example sentences
CN102760127B (en) * 2011-04-26 2017-11-03 北京百度网讯科技有限公司 Method, device and the equipment of resource type are determined based on expanded text information
CN102314418A (en) * 2011-10-09 2012-01-11 北京航空航天大学 Method for comparing Chinese similarity based on context relation
CN102339320A (en) * 2011-11-04 2012-02-01 成都市华为赛门铁克科技有限公司 Malicious web recognition method and device
CN103123624B (en) * 2011-11-18 2015-12-02 阿里巴巴集团控股有限公司 Determine method and device, searching method and the device of centre word
CN103123624A (en) * 2011-11-18 2013-05-29 阿里巴巴集团控股有限公司 Method of confirming head word, device of confirming head word, searching method and device
US9824078B2 (en) 2012-10-17 2017-11-21 Samsung Electronics Co., Ltd. Device and method for image search using one or more selected words
CN104756046A (en) * 2012-10-17 2015-07-01 三星电子株式会社 User terminal device and control method thereof
US9910839B1 (en) 2012-10-17 2018-03-06 Samsung Electronics Co., Ltd. Device and method for image search using one or more selected words
US10503819B2 (en) 2012-10-17 2019-12-10 Samsung Electronics Co., Ltd. Device and method for image search using one or more selected words
US9990346B1 (en) 2012-10-17 2018-06-05 Samsung Electronics Co., Ltd. Device and method for image search using one or more selected words
CN104364814B (en) * 2012-10-30 2017-07-07 Sk 普兰尼特有限公司 The system and method that content recommendation service is provided
US10878044B2 (en) 2012-10-30 2020-12-29 Sk Planet Co., Ltd. System and method for providing content recommendation service
CN104364814A (en) * 2012-10-30 2015-02-18 Sk普兰尼特有限公司 System and method for providing content recommendation service
CN103049495A (en) * 2012-12-07 2013-04-17 百度在线网络技术(北京)有限公司 Method, device and equipment for providing searching advice corresponding to inquiring sequence
CN105528441A (en) * 2015-12-22 2016-04-27 北京奇虎科技有限公司 Automatic marking based head word extracting method and device
CN106844638A (en) * 2017-01-19 2017-06-13 王碧波 Information retrieval method, device and electronic equipment
CN106844638B (en) * 2017-01-19 2020-11-03 杭州汇数智通科技有限公司 Information retrieval method and device and electronic equipment
CN107145555B (en) * 2017-04-28 2019-08-02 北京安数云信息技术有限公司 A kind of fuzzy sentence searching method based on participle
CN107145555A (en) * 2017-04-28 2017-09-08 北京安数云信息技术有限公司 A kind of fuzzy sentence searching method based on participle
CN107221328A (en) * 2017-05-25 2017-09-29 百度在线网络技术(北京)有限公司 The localization method and device in modification source, computer equipment and computer-readable recording medium
CN107784123B (en) * 2017-11-06 2021-01-01 北京中科智营科技发展有限公司 Topic-based search optimization method
CN107784123A (en) * 2017-11-06 2018-03-09 北京中科智营科技发展有限公司 A kind of chess game optimization method based on theme
CN108427759A (en) * 2018-03-19 2018-08-21 四川意高汇智科技有限公司 Real time data computational methods for mass data processing
CN108536676A (en) * 2018-03-28 2018-09-14 广州华多网络科技有限公司 Data processing method, device, electronic equipment and storage medium
CN108536676B (en) * 2018-03-28 2020-10-13 广州华多网络科技有限公司 Data processing method and device, electronic equipment and storage medium
CN110555159A (en) * 2018-03-30 2019-12-10 北大方正集团有限公司 Webpage retrieval method, device, equipment and storage medium
CN108984582A (en) * 2018-05-04 2018-12-11 中国信息安全研究院有限公司 A kind of inquiry request processing method
CN108984582B (en) * 2018-05-04 2023-07-28 中国信息安全研究院有限公司 Query request processing method
CN108776705B (en) * 2018-06-12 2020-11-17 厦门市美亚柏科信息股份有限公司 Text full-text accurate query method, device, equipment and readable medium
CN108776705A (en) * 2018-06-12 2018-11-09 厦门市美亚柏科信息股份有限公司 A kind of method, apparatus, equipment and readable medium that text full text is accurately inquired
CN111190948A (en) * 2019-12-26 2020-05-22 航天信息股份有限公司企业服务分公司 Retrieval coding method based on keyword sorting
CN111190993A (en) * 2019-12-26 2020-05-22 航天信息股份有限公司企业服务分公司 Hierarchical sorting method based on ordered set of keywords
CN111611489A (en) * 2020-05-22 2020-09-01 北京字节跳动网络技术有限公司 Search processing method and device, electronic equipment and storage medium
CN111523302A (en) * 2020-07-06 2020-08-11 成都晓多科技有限公司 Syntax analysis method and device, storage medium and electronic equipment

Similar Documents

Publication Publication Date Title
CN101196898A (en) Method for applying phrase index technology into internet search engine
Agichtein et al. Querying text databases for efficient information extraction
Cafarella et al. Webtables: exploring the power of tables on the web
Li et al. Text document clustering based on frequent word meaning sequences
CN101201838A (en) Method for improving searching engine based on keyword index using phrase index technique
US6792414B2 (en) Generalized keyword matching for keyword based searching over relational databases
CN102789464B (en) Natural language processing methods, devices and systems based on semantics identity
CN101169780A (en) Semantic ontology retrieval system and method
CN101441636A (en) Hospital information search engine and system based on knowledge base
CN105183884A (en) Search engine system and method based on big data technique
Landthaler et al. Extending Full Text Search for Legal Document Collections Using Word Embeddings.
US9971828B2 (en) Document tagging and retrieval using per-subject dictionaries including subject-determining-power scores for entries
CN102929902A (en) Character splitting method and device based on Chinese retrieval
CN103885985A (en) Real-time microblog search method and device
JP5718405B2 (en) Utterance selection apparatus, method and program, dialogue apparatus and method
Nakashole et al. Real-time population of knowledge bases: opportunities and challenges
CN105404677A (en) Tree structure based retrieval method
Cheng et al. MISDA: web services discovery approach based on mining interface semantics
CN102982063A (en) Control method based on tuple elaboration of relation keywords extension
CN105426490A (en) Tree structure based indexing method
Chen et al. A query substitution-search result refinement approach for long query web searches
Paparizos et al. Answering web queries using structured data sources
Al-Hamami et al. Development of an opinion blog mining system
Arslan et al. A comparison of relational databases and information retrieval libraries on turkish text retrieval
Huang et al. Learning to find comparable entities on the web

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C02 Deemed withdrawal of patent application after publication (patent law 2001)
WD01 Invention patent application deemed withdrawn after publication

Open date: 20080611