CN103294693A - Searching method, server and system - Google Patents

Searching method, server and system Download PDF

Info

Publication number
CN103294693A
CN103294693A CN2012100456068A CN201210045606A CN103294693A CN 103294693 A CN103294693 A CN 103294693A CN 2012100456068 A CN2012100456068 A CN 2012100456068A CN 201210045606 A CN201210045606 A CN 201210045606A CN 103294693 A CN103294693 A CN 103294693A
Authority
CN
China
Prior art keywords
ferret out
relevant documentation
out document
key words
document
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN2012100456068A
Other languages
Chinese (zh)
Inventor
胡汉强
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Huawei Technologies Co Ltd
Original Assignee
Huawei Technologies Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Huawei Technologies Co Ltd filed Critical Huawei Technologies Co Ltd
Priority to CN2012100456068A priority Critical patent/CN103294693A/en
Publication of CN103294693A publication Critical patent/CN103294693A/en
Pending legal-status Critical Current

Links

Images

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention provides a searching method, a server and a system. The searching method comprises the steps as follows: a search target document is received; at least one subject key word in the search target document is extracted; related documents are searched according to the at least one subject key word, and at least one related document of the search target document is acquired; and at least one related document is returned to. The searching method, the server and the system can realize related document searching taking documents as searching conditions.

Description

Searching method, server and system
Technical field
The present invention relates to data processing technique, relate in particular to a kind of searching method, server and system, belong to networking technology area.
Background technology
Search engine technique has developed nearly 20 years, and global search engine market is carved up by giants such as goole, Microsoft, Yahoo, and the search engine market Baidu of China occupies 70% market.These search engines are for example climbed by crawler technology and are got webpage at present, with the webpage participle, set up keyword to the inverted index of webpage at search engine database, when the user submits search key to, search engine retrieves the tabulation of relevant documentation according to the inverted index of keyword, and by certain webpage sort algorithm such as PageRank algorithm or personalized webpage sort algorithm results web page is sorted, the result document after will sorting then returns to the user.
As can be seen, present search engine technique all is based on the key word of user's input and searches for, also not having can be with certain piece of document or the webpage search technique as search condition, and the relevant documentation that how to retrieve the input document has very important meaning for the development of search engine technique.
Summary of the invention
At the defective that exists in the prior art, the embodiment of the invention provides a kind of searching method, server and system, in order to realize with the relevant documentation search of document as search condition.
According to the one side of the embodiment of the invention, a kind of searching method is provided, comprising:
Receive the ferret out document;
Extract at least one subject key words of described ferret out document;
Carry out the relevant documentation search according to described at least one subject key words, obtain at least one relevant documentation of described ferret out document;
Return described at least one relevant documentation.
According to the embodiment of the invention on the other hand, also provide a kind of search server, comprising:
The searching request receiver module is used for receiving the ferret out document;
The subject key words extraction module is at least one subject key words of extracting described ferret out document;
The relevant documentation search module is used for carrying out the relevant documentation search according to described at least one subject key words, obtains at least one relevant documentation of described ferret out document;
Search Results returns module, is used for returning described at least one relevant documentation.
One side again according to the embodiment of the invention, a kind of search system also is provided, the search server that comprises the embodiment of the invention, and the search client of communicating by letter with described search server, wherein said search client is used for sending the ferret out document to described search server, and receives the relevant documentation that described search server returns.
The searching method, server and the system that provide according to the embodiment of the invention, by receiving the ferret out document, and the ferret out document is carried out subject key words extract, realized based on document/webpage as search condition, with the method for the relevant documentation of searching for certain piece of document/webpage.On the one hand, make that the search condition of input is more diversified, on the other hand, can directly search out the relevant documentation of certain piece of document, make the information correlativity of search stronger, relevant information has improved the validity of search more comprehensively.
Description of drawings
In order to be illustrated more clearly in the embodiment of the invention or technical scheme of the prior art, to do to introduce simply to the accompanying drawing of required use in embodiment or the description of the Prior Art below, apparently, accompanying drawing in describing below only is some embodiments of the present invention, for those of ordinary skills, under the prerequisite of not paying creative work, can also obtain other accompanying drawing according to these accompanying drawings.
Fig. 1 is the search system Organization Chart for the searching method of realizing the embodiment of the invention;
Fig. 2 is the searching method of the embodiment of the invention;
Fig. 3 is the structural representation of the search server of the embodiment of the invention.
Embodiment
Below in conjunction with the accompanying drawing in the embodiment of the invention, the technical scheme in the embodiment of the invention is clearly and completely described, obviously, described embodiment only is the present invention's part embodiment, rather than whole embodiment.Based on the embodiment among the present invention, those of ordinary skills belong to the scope of protection of the invention not making the every other embodiment that obtains under the creative work prerequisite.
Fig. 1 is the search system Organization Chart for the searching method of realizing the embodiment of the invention.As shown in Figure 1, this search system comprises search client and search server.Search client for example comprises searching request sending module and Search Results receiver module, wherein, the searching request sending module is responsible for sending searching request to search server, this searching request as search condition, is used for the request search document relevant or similar with the document/webpage of input with certain piece of document/webpage; The Search Results receiver module is responsible for receiving the Search Results that first search server returns, namely with as the relevant or similar document/webpage of the document/webpage of search condition.Search server is responsible for receiving that search client sends is the searching request of search condition with certain document/webpage, extract one or more subject key words according to input document/webpage, initiate Webpage search according to subject key words, the acquisition search result web page is tabulated, and search result web page is returned to search client.Describe from the angle of the search server searching method to the embodiment of the invention below.
Fig. 2 is the searching method of the embodiment of the invention, and as shown in Figure 2, this searching method may further comprise the steps:
Step S201 receives the ferret out document;
Step S202 extracts at least one subject key words of described ferret out document;
Step S203 carries out the relevant documentation search according to described at least one subject key words, obtains at least one relevant documentation of described ferret out document;
Step S204 returns described at least one relevant documentation.
Particularly, search server receive that search client is submitted to, be the searching request of search condition with certain piece of document/webpage (being the ferret out document); Wherein, the ferret out document can be the document of web document, subscriber's local Computer Storage, the document of network side data library storage or the article of certain blog/microblogging/forum etc.Search server extracts one or more subject key words according to the ferret out document, and initiates Webpage search according to the subject key words of extracting, and obtains the search result document/web page listings that is complementary with subject key words.Search server returns to search client with search result document/web page listings of obtaining.
According to the searching method of above-described embodiment, by receiving the ferret out document, and the ferret out document is carried out subject key words extract, realized based on document/webpage as search condition, with the method for the relevant documentation of searching for certain piece of document/webpage.On the one hand, make that the search condition of input is more diversified, on the other hand, can directly search out the relevant documentation of certain piece of document, make the information correlativity of search stronger, relevant information has improved the validity of search more comprehensively.
Further, in the searching method of above-described embodiment, return described at least one relevant documentation and also comprise before:
Calculate the degree of correlation of described at least one relevant documentation and described ferret out document;
According to the degree of correlation order from high to low with each relevant documentation and described ferret out document described at least one relevant documentation is sorted;
Correspondingly, returning described at least one relevant documentation specifically comprises: return the relevant documentation after the ordering.
Searching method according to above-described embodiment, search server is by calculating the degree of correlation of ferret out document and search result document, and according to the degree of correlation search result document is arranged according to degree of correlation order from high to low, thereby the document that the degree of correlation is higher preferentially shows the user, has improved user search efficient and user's perception.
Further, in the searching method of above-described embodiment, at least one subject key words of extracting described ferret out document specifically comprises:
Identify the field classification of described ferret out document;
From the subject dictionary corresponding with described field classification that presets, extract the descriptor of described ferret out document;
Extract the general named entity of described ferret out document, wherein said general named entity comprises time, place and/or mechanism;
With described descriptor and described general named entity at least one subject key words as described ferret out document.
Particularly, for example adopt the field classification under the sorting algorithm identification ferret out document commonly used such as Bayes, svm classifier, and extract the descriptor of ferret out document by inquiring about other subject dictionary of corresponding domain class.For example: the ferret out document belongs to the document of sport category through classification, just extracts with the subject dictionary (as comprising " football ", " basketball ", " tennis ", " shuttlecock ", " table tennis ", " running ", " long-jump ", " high jump ", " diving ", " hurdling " etc. descriptor) of sport category to belong to the descriptor of importing document.
In addition, except the field related subject word that will extract in the above described manner as subject key words, also need extract the general named entity of ferret out document as subject key words.Wherein, general named entity for example comprises name, place name and mechanism's name.The extraction of general named entity can be with disaggregated models such as maximum entropy or condition random fields, the training set language material is manually marked, come the model parameter of train classification models according to the corpus through artificial mark, and then to the candidate word of the ferret out document semantic tagger of classifying.
Further, in the searching method of above-described embodiment, described descriptor and described general named entity at least one subject key words as described ferret out document is specifically comprised:
Calculate the word frequency that each described descriptor and each described general named entity occur in described ferret out document;
Filter out according to the size of described word frequency and to meet pre-conditioned descriptor and/or general named entity, as at least one subject key words of described ferret out document.
Particularly, each descriptor can be calculated according to the position of the number of times that occurs in the ferret out document and appearance with the word frequency that each described general named entity occurs in described ferret out document.For example, according to the diverse location that appears in the document different weights is set, as weight maximum when appearing in the Document Title, each descriptor or general named entity each position occurrence number and respective weights long-pending in document add up and, namely can be used as the word frequency that each descriptor or general named entity occur in document.Further, according to word frequency search result document is screened, wherein, pre-conditionedly for example need more than or equal to a certain threshold value for word frequency, or select in whole descriptor and the general named entity, a part of descriptor that word frequency is higher relatively and general named entity are as subject key words.
Further, in the searching method of above-described embodiment, the degree of correlation of calculating described at least one relevant documentation and described ferret out document specifically comprises:
Extract the subject key words of described relevant documentation, calculate described relevant documentation and described ferret out document based on the degree of correlation score value of subject key words; And/or
Extract the event of the preset event type of described ferret out document, and the event of the preset event type of described relevant documentation, described ferret out document and the degree of correlation score value of described relevant documentation based on identical preset event calculated;
According to described degree of correlation score value based on subject key words, and/or obtain the degree of correlation of described relevant documentation and described ferret out document based on the degree of correlation score value of identical preset event.
Particularly, the method for the subject key words of the extraction relevant documentation for example method with the subject key words of extracting the ferret out document is identical, namely extracts the descriptor of relevant documentation according to the domain classification of relevant documentation, and extracts the general named entity of relevant documentation.Calculating relevant documentation and described ferret out document for example comprise following mode based on the degree of correlation score value of subject key words:
(1) calculate based on the similarity of the descriptor of domain classification: the similarity between the compare string string for example, for example use wordnet, relate to synonym, hypernym and hyponym etc.;
(2) similarity between the general named entity is calculated: at first named entity is classified, for example be divided into name, place name, mechanism's name and time etc.; The similarity of two named entities of same type in difference relevant documentation and the ferret out document, more specifically, for the time, for example computing time is poor; For place, personage or mechanism etc., for example similarity between the compare string string.Wherein, during similarity between the compare string string, for example use wordnet (comprising synonym, hypernym and hyponym etc.), also for example the self-defined vocabulary of user application (as " Huawei's software " and " software company ") and abbreviation vocabulary (as " HW " and " ZTE ") etc.By with descriptor similarity and named entity similarity addition in twos, can obtain the degree of correlation score value based on subject key words of relevant documentation and ferret out document.
The event of carrying out from ferret out document and similar document is extracted and for example be may further comprise the steps: according to the Event triggered word, find the candidate events sentence, determine that according to trigger word candidate's sentence is the event of certain event type, wherein the Event triggered word can be expanded according to synonym/hypernym/hyponym, for example comprises " physical culture/football ", " life/shopping " etc.; Determine event element template according to event type; Stamp the event element tags for the related term of candidate's sentence according to event element template, this part is equivalent to the work of SRL (Semantic Roles Labeling), stamps Semantic Frame Roles label according to Semantic Frame (being equivalent to event element template) for candidate's sentence.Preferably, the method for the event of simplification extraction for example is time (Time), place (Place), leading role (Agent), theme (Topic), action (Action), six the event elements of object (Object) that only extract event.
After the event of finishing is extracted, calculate relevant documentation and described ferret out document based on the degree of correlation score value of event, namely calculate the similarity between corresponding event element or the semantic role, particularly, for example:
(a) similarity of time: the gap between computing time, the mistiming is the smaller the better;
(b) similarity of theme: the similarity between the calculating character string, use WordNet, comprise synonym/hypernym/hyponym etc.;
(c) Dong Zuo similarity: the similarity between the calculating character string, use WordNet, comprise synonym/hypernym/hyponym etc.;
(d) leading role's similarity: the similarity between the character string, use User Defined vocabulary/abbreviation vocabulary etc.;
(e) similarity in place: the similarity between the character string, use User Defined vocabulary/abbreviation vocabulary etc.;
(f) similarity of object: the similarity between the character string, use User Defined vocabulary/abbreviation vocabulary etc.
By the above-mentioned similarity of the event in twos addition that calculates namely being obtained relevant documentation and described ferret out document based on the similarity score value of event.At last, when considering above similarity score value based on subject key words simultaneously and during based on the similarity score value of event, can passing through with both weighting summations, to obtain the similarity of two cross-cutting document entity associations.In addition, be to improve the simplicity that similarity is calculated, also can be only by calculating based on the similarity score value of subject key words or determining the degree of correlation of relevant documentation and described ferret out document based on the similarity score value of event.
Below, be example with concrete search example, the searching method of above-described embodiment is described.
Example one
Step 1: inputted search destination document: microblogging document, content for " [western medium's very first time close the show of orifice card praise its only come on stage just ignite the whole audience] western medium closes the performance of orifice card; the hole card is come on stage and is ignited 14 days July Beijing time of Milky Way physical culture Sina.com sports news, and 2011 racing season Chinese Premier Leagues the 17th are taken turns and launched contention comprehensively.In a noticeable middle hypergeometric match, Guangzhou is permanent to have taken visiting Nanchang Heng Yuan at home court 5-0 greatly ... ";
Step 2: with this microblogging document classification: obtain the document and belong to: " physical culture/football " classification, and utilize the field dictionary of " physical culture/football " type that pre-defines to find the field related term: " Chinese Premier League ", " defeating ", " coming on stage ", " home court ";
Step 3: integrating step 1 and step 2, the subject key words of this microblogging document is: " hole card ", " Guangzhou is permanent big ", " Milky Way physical culture ", " Chinese Premier League ", " defeating ", " coming on stage ", " home court ";
Step 4: search for and obtain search result document with " hole card ", " Guangzhou is permanent big ", " Milky Way physical culture ", " Chinese Premier League ", " defeating ", " coming on stage " and " home court ", comprise that name is called first relevant documentation of " friendly match-Real Madrid 7-1 wins the permanent big sieve's C goal card card in Guangzhou and comes on stage ", and name is called second relevant documentation of " on the Chinese football history your foreign aid-hole card expose tonight the Milky Way ";
Step 5: calculate ferret out document and two relevant documentations based on the similarity of topic keyword:
First relevant documentation: with the coupling of the subject key words of ferret out document: " hole card " (coupling), " Guangzhou is permanent big " (coupling), " Milky Way physical culture " (coupling), " Chinese Premier League " (not matching), " defeating " (coupling), " coming on stage " (coupling), " home court " (not matching); Therefore, be 1 if hypothesis " coupling " is got score value, " not matching " gets branch 0, and then first relevant documentation and ferret out document are based on the similarity score value=1+1+1+0+1+1+0=5 of subject key words;
Second relevant documentation: with the coupling of the subject key words of ferret out document: " hole card " (coupling), " Guangzhou is permanent big " (coupling), " Milky Way physical culture " (coupling), " Chinese Premier League " (coupling), " defeating " (not matching), " coming on stage " (not matching), " home court " (coupling); Therefore, be 1 if hypothesis " coupling " is got score value, " not matching " gets branch 0, and then second relevant documentation and ferret out document are based on the similarity score value=1+1+1+1+0+0+1=5 of subject key words.
Step 6: calculate ferret out document and two relevant documentations based on the similarity of event:
Two Event triggered words are set, and wherein event 1 trigger word is " coming on stage ", and event 2 trigger words are " defeating ".
" come on stage " according to event 1 trigger word, extract event 1 from the ferret out document, specifically comprise event element " July 14 ", " Milky Way physical culture ", " hole card ", " coming on stage " and " Chinese Premier League "; " defeat " according to event 2 trigger words, extract event 2 from the ferret out document, specifically comprise event element " July 14 ", " Chinese Premier League ", " Guangzhou is permanent big ", " home court ", " 5-0 defeats " and " Nanchang Heng Yuan ";
" come on stage " according to event 1 trigger word, extract event 1 from first relevant documentation, specifically comprise event element " August 3 ", " Milky Way stadium ", " card card ", " friendly match " and " substitute is come on stage "; " defeat " according to event 2 trigger words, extract event 2 from first relevant documentation, specifically comprise event element " August 3 ", " Milky Way ", " Real Madrid ", " friendly match ", " 7 to 1 defeat ", " Guangzhou is permanent big ";
" expose " synonym of (" coming on stage " expansion) according to event 1 trigger word, extract event 1 from second relevant documentation, specifically comprise event element " July 14 ", " in super race ", " hole card ", " exposing " and " Milky Way "; According to event 2 trigger words " VS " (the hypernym expansion of " defeating "), extract event 2 from second relevant documentation, specifically comprise event element " July 14 ", " in super race ", " Guangzhou is permanent big ", " VS Nanchang Heng Yuan " and " Milky Way stadium ";
Table 1 be ferret out document and first relevant documentation based on the similarity calculations list of event, as shown in table 1:
Table 1
Figure BDA0000138566010000081
Ferret out document and first relevant documentation are based on the similarity of the similarity+event 2 of the similarity=event 1 of event=(0.2+0.9+0+0.5+0.5+0)+(0.2+0.9+0+0.5+0.1+0)=2.1+1.7=3.8.
Table 2 be ferret out document and second relevant documentation based on the similarity calculations list of event, as shown in table 2:
Table 2
Figure BDA0000138566010000092
Ferret out document and second relevant documentation are based on the similarity of the similarity+event 2 of the similarity=event 1 of event=(1+0.8+1+0.9+0.8+0.8)+(1+1+1+0.9+0.5+1)=5.3+5.4=10.7.
Step 7: the similarity of calculating ferret out document and two relevant documentations:
The similarity of ferret out document and first relevant documentation=ferret out document and first relevant documentation are based on the similarity+ferret out document of subject key words and the first relevant documentation similarity=5+3.8=8.8 based on event;
The similarity of ferret out document and second relevant documentation=ferret out document and second relevant documentation are based on the similarity+ferret out document of subject key words and the second relevant documentation similarity=5+10.7=15.7 based on event.
Step 8: the similarity score value according to the similarity of ferret out document and two relevant documentations sorts to two relevant documentations, because ferret out document and the similarity great-than search destination document of second relevant documentation and the similarity of first relevant documentation, so, second relevant documentation is come the front of first relevant documentation, return to the user again.
Searching method according to above-described embodiment, by calculate the degree of correlation score value of ferret out document and relevant documentation with its similarity based on the event of similar events as type based on the similarity between the subject key words, make that both information correlativities are stronger, related information is more comprehensive.
Fig. 3 is the structural representation of the search server of the embodiment of the invention.As shown in Figure 3, this search server comprises:
Searching request receiver module 31 is used for receiving the ferret out document;
Subject key words extraction module 32 is at least one subject key words of extracting described ferret out document;
Relevant documentation search module 33 is used for carrying out the relevant documentation search according to described at least one subject key words, obtains at least one relevant documentation of described ferret out document;
Search Results returns module 34, is used for returning described at least one relevant documentation.
The idiographic flow that the search server of above-described embodiment is carried out search is identical with the searching method of previous embodiment, so locate to repeat no more.
According to the search server of above-described embodiment, by receiving the ferret out document, and the ferret out document is carried out subject key words extract, realized based on document/webpage as search condition, with the method for the relevant documentation of searching for certain piece of document/webpage.On the one hand, make that the search condition of input is more diversified, on the other hand, can directly search out the relevant documentation of certain piece of document, make the information correlativity of search stronger, relevant information has improved the validity of search more comprehensively.
Further, in the search server of above-described embodiment, also comprise:
The relevant documentation order module is for the degree of correlation of calculating described at least one relevant documentation and described ferret out document; According to the degree of correlation order from high to low with each relevant documentation and described ferret out document described at least one relevant documentation is sorted;
Correspondingly, described Search Results returns the relevant documentation after module also is used for returning ordering.
Search server according to above-described embodiment, by calculating the degree of correlation of ferret out document and search result document, and according to the degree of correlation search result document is arranged according to degree of correlation order from high to low, thereby the document that the degree of correlation is higher preferentially shows the user, has improved user search efficient and user's perception.
Further, in the search server of above-described embodiment, described subject key words extraction module comprises:
First processing unit is for the field classification of the described ferret out document of identification;
Second processing unit is used for from the descriptor of the subject dictionary extraction described ferret out document corresponding with described field classification that presets;
The 3rd processing unit, for the general named entity that extracts described ferret out document, wherein said general named entity comprises time, place and/or mechanism;
The manages the unit everywhere, is used for described descriptor and described general named entity at least one subject key words as described ferret out document.
Further, in the search server of above-described embodiment, described manages the unit everywhere comprises:
The word frequency computation subunit is used for the word frequency that each described descriptor of calculating and each described general named entity occur at described ferret out document;
Subject key words is extracted subelement, filters out for the size according to described word frequency to meet pre-conditioned descriptor and/or general named entity, as at least one subject key words of described ferret out document.
Further, in the search server of above-described embodiment, the relevant documentation order module comprises:
The 5th processing unit for the subject key words of extracting described relevant documentation, calculates described relevant documentation and described ferret out document based on the degree of correlation score value of subject key words; And/or
The 6th processing unit, for the event of the preset event type of extracting described ferret out document, and the event of the preset event type of described relevant documentation, calculate described ferret out document and the degree of correlation score value of described relevant documentation based on identical preset event;
The 7th processing unit is used for according to described degree of correlation score value based on subject key words, and/or obtains the degree of correlation of described relevant documentation and described ferret out document based on the degree of correlation score value of identical preset event.
Search server according to above-described embodiment, by calculate the degree of correlation score value of ferret out document and relevant documentation with its similarity based on the event of similar events as type based on the similarity between the subject key words, make that both information correlativities are stronger, related information is more comprehensive.
The embodiment of the invention also provides a kind of search system, the framework of this search system is for example shown in Figure 1, the search server that comprises above-mentioned arbitrary embodiment, and the search client of communicating by letter with search server, wherein said search client is used for sending the ferret out document to described search server, and receives the relevant documentation that described search server returns.
According to the search system of above-described embodiment, extract by the ferret out document being carried out subject key words, realized based on document/webpage as search condition, with the method for the relevant documentation of searching for certain piece of document/webpage.On the one hand, make that the search condition of input is more diversified, on the other hand, can directly search out the relevant documentation of certain piece of document, make the information correlativity of search stronger, relevant information has improved the validity of search more comprehensively.
It should be noted that at last: above each embodiment is not intended to limit only in order to technical scheme of the present invention to be described; Although the present invention has been described in detail with reference to aforementioned each embodiment, those of ordinary skill in the art is to be understood that: it still can be made amendment to the technical scheme that aforementioned each embodiment puts down in writing, and perhaps some or all of technical characterictic wherein is equal to replacement; And these modifications or replacement do not make the essence of appropriate technical solution break away from the scope of various embodiments of the present invention technical scheme.

Claims (11)

1. a searching method is characterized in that, comprising:
Receive the ferret out document;
Extract at least one subject key words of described ferret out document;
Carry out the relevant documentation search according to described at least one subject key words, obtain at least one relevant documentation of described ferret out document;
Return described at least one relevant documentation.
2. searching method according to claim 1 is characterized in that, returns described at least one relevant documentation and also comprises before:
Calculate the degree of correlation of described at least one relevant documentation and described ferret out document;
According to the degree of correlation order from high to low with each relevant documentation and described ferret out document described at least one relevant documentation is sorted;
Correspondingly, returning described at least one relevant documentation specifically comprises: return the relevant documentation after the ordering.
3. searching method according to claim 1 and 2 is characterized in that, at least one subject key words of extracting described ferret out document specifically comprises:
Identify the field classification of described ferret out document;
From the subject dictionary corresponding with described field classification that presets, extract the descriptor of described ferret out document;
Extract the general named entity of described ferret out document, wherein said general named entity comprises time, place and/or mechanism;
With described descriptor and described general named entity at least one subject key words as described ferret out document.
4. searching method according to claim 3 is characterized in that, described descriptor and described general named entity at least one subject key words as described ferret out document is specifically comprised:
Calculate the word frequency that each described descriptor and each described general named entity occur in described ferret out document;
Filter out according to the size of described word frequency and to meet pre-conditioned descriptor and/or general named entity, as at least one subject key words of described ferret out document.
5. searching method according to claim 2 is characterized in that, the degree of correlation of calculating described at least one relevant documentation and described ferret out document specifically comprises:
Extract the subject key words of described relevant documentation, calculate described relevant documentation and described ferret out document based on the degree of correlation score value of subject key words; And/or
Extract the event of the preset event type of described ferret out document, and the event of the preset event type of described relevant documentation, described ferret out document and the degree of correlation score value of described relevant documentation based on identical preset event calculated;
According to described degree of correlation score value based on subject key words, and/or obtain the degree of correlation of described relevant documentation and described ferret out document based on the degree of correlation score value of identical preset event.
6. a search server is characterized in that, comprising:
The searching request receiver module is used for receiving the ferret out document;
The subject key words extraction module is at least one subject key words of extracting described ferret out document;
The relevant documentation search module is used for carrying out the relevant documentation search according to described at least one subject key words, obtains at least one relevant documentation of described ferret out document;
Search Results returns module, is used for returning described at least one relevant documentation.
7. search server according to claim 6 is characterized in that, also comprises:
The relevant documentation order module is for the degree of correlation of calculating described at least one relevant documentation and described ferret out document; According to the degree of correlation order from high to low with each relevant documentation and described ferret out document described at least one relevant documentation is sorted;
Correspondingly, described Search Results returns the relevant documentation after module also is used for returning ordering.
8. according to claim 7 or 8 described search servers, it is characterized in that described subject key words extraction module comprises:
First processing unit is for the field classification of the described ferret out document of identification;
Second processing unit is used for from the descriptor of the subject dictionary extraction described ferret out document corresponding with described field classification that presets;
The 3rd processing unit, for the general named entity that extracts described ferret out document, wherein said general named entity comprises time, place and/or mechanism;
The manages the unit everywhere, is used for described descriptor and described general named entity at least one subject key words as described ferret out document.
9. search server according to claim 8 is characterized in that, described manages the unit everywhere comprises:
The word frequency computation subunit is used for the word frequency that each described descriptor of calculating and each described general named entity occur at described ferret out document;
Subject key words is extracted subelement, filters out for the size according to described word frequency to meet pre-conditioned descriptor and/or general named entity, as at least one subject key words of described ferret out document.
10. search server according to claim 7 is characterized in that, the relevant documentation order module comprises:
The 5th processing unit for the subject key words of extracting described relevant documentation, calculates described relevant documentation and described ferret out document based on the degree of correlation score value of subject key words; And/or
The 6th processing unit, for the event of the preset event type of extracting described ferret out document, and the event of the preset event type of described relevant documentation, calculate described ferret out document and the degree of correlation score value of described relevant documentation based on identical preset event;
The 7th processing unit is used for according to described degree of correlation score value based on subject key words, and/or obtains the degree of correlation of described relevant documentation and described ferret out document based on the degree of correlation score value of identical preset event.
11. search system, it is characterized in that, comprise arbitrary described search server as claim 6-10, and the search client of communicating by letter with described search server, wherein said search client is used for sending the ferret out document to described search server, and receives the relevant documentation that described search server returns.
CN2012100456068A 2012-02-27 2012-02-27 Searching method, server and system Pending CN103294693A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN2012100456068A CN103294693A (en) 2012-02-27 2012-02-27 Searching method, server and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN2012100456068A CN103294693A (en) 2012-02-27 2012-02-27 Searching method, server and system

Publications (1)

Publication Number Publication Date
CN103294693A true CN103294693A (en) 2013-09-11

Family

ID=49095585

Family Applications (1)

Application Number Title Priority Date Filing Date
CN2012100456068A Pending CN103294693A (en) 2012-02-27 2012-02-27 Searching method, server and system

Country Status (1)

Country Link
CN (1) CN103294693A (en)

Cited By (17)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103646034A (en) * 2013-11-14 2014-03-19 东华理工大学 Web search engine system and search method based content credibility
CN106095737A (en) * 2016-06-07 2016-11-09 杭州凡闻科技有限公司 Documents Similarity computational methods and similar document the whole network retrieval tracking
CN106294875A (en) * 2016-08-25 2017-01-04 中国国防科技信息中心 A kind of name entity fuzzy retrieval method and system
CN106471500A (en) * 2014-07-04 2017-03-01 三星电子株式会社 The method that relevant information is provided and the electronic installation being suitable for the method
CN106844436A (en) * 2016-12-15 2017-06-13 北京小度信息科技有限公司 The sort method and device of Query Result
CN107291949A (en) * 2017-07-17 2017-10-24 小草数语(北京)科技有限公司 Information search method and device
CN107657005A (en) * 2017-09-22 2018-02-02 山东浪潮云服务信息科技有限公司 The search method and device of a kind of subject web page
CN107908681A (en) * 2017-10-30 2018-04-13 苏州大学 A kind of similar website lookup method, system, equipment and medium
CN110019682A (en) * 2017-12-28 2019-07-16 北京京东尚科信息技术有限公司 For handling system, the method and apparatus of information
CN110287289A (en) * 2019-06-25 2019-09-27 北京金海群英网络信息技术有限公司 A kind of document keyword extraction and the method based on document matches commodity
CN110472117A (en) * 2018-05-09 2019-11-19 成都野望数码科技有限公司 A kind of determination method and device of destination document
CN110955763A (en) * 2019-11-15 2020-04-03 深圳供电局有限公司 Data searching method and system based on audit risk database
CN110955633A (en) * 2018-09-26 2020-04-03 北京国双科技有限公司 Retrieval method and device
CN111680493A (en) * 2020-08-12 2020-09-18 江西风向标教育科技有限公司 English text analysis method and device, readable storage medium and computer equipment
CN112287148A (en) * 2019-03-29 2021-01-29 艾思益信息应用技术股份公司 Information providing system
CN114995691A (en) * 2021-03-01 2022-09-02 北京字跳网络技术有限公司 Document processing method, device, equipment and medium
CN114997116A (en) * 2021-03-01 2022-09-02 北京字跳网络技术有限公司 Document editing method, device, equipment and storage medium

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20050251534A1 (en) * 2000-12-04 2005-11-10 Chris Nunez Parameterized keyword and methods for searching, indexing and storage
CN101055580A (en) * 2006-04-13 2007-10-17 Lg电子株式会社 System, method and user interface for retrieving documents

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20050251534A1 (en) * 2000-12-04 2005-11-10 Chris Nunez Parameterized keyword and methods for searching, indexing and storage
CN101055580A (en) * 2006-04-13 2007-10-17 Lg电子株式会社 System, method and user interface for retrieving documents

Cited By (25)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103646034B (en) * 2013-11-14 2017-03-08 东华理工大学 One kind is based on content believable Web search automotive engine system and searching method
CN103646034A (en) * 2013-11-14 2014-03-19 东华理工大学 Web search engine system and search method based content credibility
CN106471500A (en) * 2014-07-04 2017-03-01 三星电子株式会社 The method that relevant information is provided and the electronic installation being suitable for the method
CN106095737A (en) * 2016-06-07 2016-11-09 杭州凡闻科技有限公司 Documents Similarity computational methods and similar document the whole network retrieval tracking
CN106294875B (en) * 2016-08-25 2019-05-17 中国国防科技信息中心 A kind of name entity fuzzy retrieval method and system
CN106294875A (en) * 2016-08-25 2017-01-04 中国国防科技信息中心 A kind of name entity fuzzy retrieval method and system
CN106844436A (en) * 2016-12-15 2017-06-13 北京小度信息科技有限公司 The sort method and device of Query Result
CN106844436B (en) * 2016-12-15 2020-07-31 北京星选科技有限公司 Query result sorting method and device
CN107291949B (en) * 2017-07-17 2020-11-13 绿湾网络科技有限公司 Information searching method and device
CN107291949A (en) * 2017-07-17 2017-10-24 小草数语(北京)科技有限公司 Information search method and device
CN107657005B (en) * 2017-09-22 2020-03-20 浪潮云信息技术有限公司 Retrieval method and device for theme webpage
CN107657005A (en) * 2017-09-22 2018-02-02 山东浪潮云服务信息科技有限公司 The search method and device of a kind of subject web page
CN107908681A (en) * 2017-10-30 2018-04-13 苏州大学 A kind of similar website lookup method, system, equipment and medium
CN110019682A (en) * 2017-12-28 2019-07-16 北京京东尚科信息技术有限公司 For handling system, the method and apparatus of information
CN110472117B (en) * 2018-05-09 2023-01-24 成都野望数码科技有限公司 Target document determination method and device
CN110472117A (en) * 2018-05-09 2019-11-19 成都野望数码科技有限公司 A kind of determination method and device of destination document
CN110955633A (en) * 2018-09-26 2020-04-03 北京国双科技有限公司 Retrieval method and device
CN112287148A (en) * 2019-03-29 2021-01-29 艾思益信息应用技术股份公司 Information providing system
CN110287289A (en) * 2019-06-25 2019-09-27 北京金海群英网络信息技术有限公司 A kind of document keyword extraction and the method based on document matches commodity
CN110955763A (en) * 2019-11-15 2020-04-03 深圳供电局有限公司 Data searching method and system based on audit risk database
CN111680493A (en) * 2020-08-12 2020-09-18 江西风向标教育科技有限公司 English text analysis method and device, readable storage medium and computer equipment
CN114997116A (en) * 2021-03-01 2022-09-02 北京字跳网络技术有限公司 Document editing method, device, equipment and storage medium
WO2022184034A1 (en) * 2021-03-01 2022-09-09 北京字跳网络技术有限公司 Document processing method and apparatus, device, and medium
CN114995691A (en) * 2021-03-01 2022-09-02 北京字跳网络技术有限公司 Document processing method, device, equipment and medium
CN114995691B (en) * 2021-03-01 2024-03-08 北京字跳网络技术有限公司 Document processing method, device, equipment and medium

Similar Documents

Publication Publication Date Title
CN103294693A (en) Searching method, server and system
Hamidian et al. Rumor identification and belief investigation on twitter
Li et al. Keyword extraction based on tf/idf for Chinese news document
CN103164454B (en) Keyword group technology and system
CN103020293B (en) A kind of construction method and system of the ontology library of mobile application
US9201880B2 (en) Processing a content item with regard to an event and a location
US8554540B2 (en) Topic map based indexing and searching apparatus
US20060235843A1 (en) Method and system for semantic search and retrieval of electronic documents
CN103294681B (en) Method and device for generating search result
CN103313248B (en) Method and device for identifying junk information
CN113553429B (en) Normalized label system construction and text automatic labeling method
Duarte Torres et al. An analysis of queries intended to search information for children
CN103186556A (en) Method for obtaining and searching structural semantic knowledge and corresponding device
CN102929873A (en) Method and device for extracting searching value terms based on context search
CN103678576A (en) Full-text retrieval system based on dynamic semantic analysis
CN103631794A (en) Method, device and equipment for sorting search results
CN106294744A (en) Interest recognition methods and system
CN103136192B (en) Translate requirements recognition methods and system
CN102163234A (en) Equipment and method for error correction of query sequence based on degree of error correction association
CN100458797C (en) Process for ordering network advertisement
CN103577405A (en) Interest analysis based micro-blogger community classification method
CN102541910A (en) Keywords extraction method
CN102999521B (en) A kind of method and device identifying search need
CN104268230A (en) Method for detecting objective points of Chinese micro-blogs based on heterogeneous graph random walk
CN101334789A (en) Device for identifying document plagiarism by search engine

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20130911