CN104679783A - Network searching method and device - Google Patents

Network searching method and device Download PDF

Info

Publication number
CN104679783A
CN104679783A CN201310633696.7A CN201310633696A CN104679783A CN 104679783 A CN104679783 A CN 104679783A CN 201310633696 A CN201310633696 A CN 201310633696A CN 104679783 A CN104679783 A CN 104679783A
Authority
CN
China
Prior art keywords
entity
results
web
web page
result
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201310633696.7A
Other languages
Chinese (zh)
Other versions
CN104679783B (en
Inventor
张友书
余浩
张阔
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Sogou Technology Development Co Ltd
Beijing Sogou Information Service Co Ltd
Original Assignee
Beijing Sogou Technology Development Co Ltd
Beijing Sogou Information Service Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Sogou Technology Development Co Ltd, Beijing Sogou Information Service Co Ltd filed Critical Beijing Sogou Technology Development Co Ltd
Priority to CN201310633696.7A priority Critical patent/CN104679783B/en
Publication of CN104679783A publication Critical patent/CN104679783A/en
Application granted granted Critical
Publication of CN104679783B publication Critical patent/CN104679783B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/957Browsing optimisation, e.g. caching or content distillation
    • G06F16/9577Optimising the visualization of content, e.g. distillation of HTML documents
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/951Indexing; Web crawling techniques

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Transfer Between Computers (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention provides a network searching method and device and relates to the technical field of network searching. The network searching method specifically comprises the steps of searching a web page matched with input query strings to obtain web page results; performing retrieval in a knowledge base to obtain entity results corresponding to the query strings, wherein entity objects in the whole network are stored in the knowledge base; conducting analysis and matching on the entity results in wed page contents corresponding to the web page results and screening target entities corresponding to the web page results; correspondingly showing the target entities and the matched web page results respectively. By means of the network searching method and device, the target entities relevant with the query strings in the web page results are respectively screened and are simply and intuitively shown for a user, the user can judge the relevance between the page contents corresponding to the current web page results and the query strings can be judged without clocking and checking and further judge the reliability of the page contents included in the current web page results, and the information query efficiency can be improved.

Description

A kind of network search method and device
Technical field
The application relates to web search technical field, particularly relates to a kind of network search method and device.
Background technology
At present, data search has become one of topmost application in internet.Typically to search plain engine, it utilizes searching machine people spider (Spider) program of server usually, web site contents big and small on automatic search internet, between each query string and all relevant webpages, set up a corresponding relation according to web page correlation principle, be stored in the web database of its webserver; As long as user's input inquiry string just can find all searched webpage meeting this query string feature, and represents Search Results in the mode of hyperlink, click corresponding link and just can access corresponding webpage, thus find information needed.
Result items in existing Search Results, only display web page title and word are made a summary simply usually, and have marked in word summary part red font the word matched with the participle in query string, to facilitate user's quick position when browsing.
But, due to the number of words restriction of summary or the design feature of text, user cannot see all the elements relevant to query string in webpage in Search Results, also need click to enter corresponding webpage carefully to check, therefore, user needs the link clicking multiple Search Results respectively that it just can be found to want the information of searching for, and reduces the efficiency of information inquiry.
In a word, the technical matters needing those skilled in the art urgently to solve is exactly: the efficiency that how can improve information inquiry.
Summary of the invention
Technical problems to be solved in this application are to provide a kind of network search method and device, can improve the efficiency of information inquiry.
In order to solve the problem, this application discloses a kind of network search method, comprising:
The webpage that search matches with the query string of input, obtains web results;
In knowledge base, retrieval obtains entity result corresponding to described query string; Wherein, the entity object in the whole network is stored in described knowledge base;
Described entity result is carried out respectively in the web page contents that each web results is corresponding analysis coupling, filter out the target entity that every bar web results is corresponding;
Described target entity is carried out corresponding representing with the web results of coupling respectively.
Preferably, the described step filtering out target entity corresponding to every bar web results, comprise: in the web page contents that every bar web results is corresponding, filter out the object content matched with described entity result respectively, and using entity result corresponding for described object content as target entity corresponding to described web results.
Preferably, filter out the step of the object content matched with described entity result in the described web page contents corresponding in every bar web results respectively, comprising:
The Web page text corresponding to described web results is analyzed;
According to the analysis result that Web page text is corresponding, extract feature in described Web page text; It is one or more that described feature comprises in title, subtitle, form, summary and overstriking word;
The word extracted is mated with each entity result respectively, obtains the object content matched with each entity result in described web results.
Preferably, described method also comprises:
The frequency that the object content matched according to described and described entity result occurs respectively in each described web results and/or position, sort to described target entity in units of every bar web results;
Then described described target entity is carried out the corresponding step represented with the web results of coupling is respectively, the ranking results according to target entity carries out correspondence to the mark of each target entity that described web results is mated respectively and represents.
Preferably, described in the target entity that represents with the hyperlink of correspondence, described hyperlink is respectively used to go to mated web results;
Then described method also comprises:
After receiving the triggering to described target entity, described web results is positioned to the position with described target entity Corresponding matching respectively, thus loads the web page contents of described target entity Corresponding matching.
Preferably, described in knowledge base retrieval obtain the step of entity result corresponding to described query string, comprising:
Identify the entity word in described query string and entity attribute word, and carry out labeling;
Syntax analysis is carried out to described query string, the syntax analysis result obtained comprise grammar rule and meet described grammar rule, the entity word of labeling;
It is the query statement that machine language describes by described syntax analysis results conversion;
Retrieve in knowledge base according to described query statement and obtain corresponding entity information, as entity result.
Preferably, described method also comprises:
Pretreatment operation is carried out to described query string, described pretreatment operation comprises error correction, to go in word and participle operation one or more;
The then webpage that matches of the query string of described search and input, the step obtaining web results is, the web document that search and pretreated query string match in the whole network, obtains the web results that described query string is corresponding;
The described step that retrieval obtains entity result corresponding to described query string in knowledge base is carry out structuralized query according to query string after pre-service to the entity object in the whole network in knowledge base, obtains the entity result that described query string is corresponding.
On the other hand, present invention also provides a kind of web search device, comprising:
Webpage search unit, for the webpage that the query string searched for input matches, obtains web results;
Entity search unit, obtains entity result corresponding to described query string for retrieval in knowledge base; Wherein, the entity object in the whole network is stored in described knowledge base;
Entity screening unit, for described entity result being carried out respectively in the web page contents that each web results is corresponding analysis coupling, filters out the target entity that every bar web results is corresponding; And
Represent unit, for described target entity is carried out corresponding representing with the web results of coupling respectively.
Preferably, described entity screening unit, specifically for filtering out the object content matched with described entity result in the web page contents that every bar web results is corresponding respectively, and using entity result corresponding for described object content as target entity corresponding to described web results.
Preferably, described entity screening unit comprises:
Web page analysis module, analyzes for the Web page text corresponding to described web results;
Extraction module, for the analysis result that foundation Web page text is corresponding, extracts feature in described Web page text; It is one or more that described feature comprises in title, subtitle and overstriking word; And
Matching module, for being mated with each entity result respectively by the word extracted, obtains the object content matched with each entity result in described web results.
Compared with prior art, the application has the following advantages:
The network search method of the application is except obtaining web results corresponding to query string, also by retrieving the entity result obtained in knowledge base, web results is screened, obtain the target entity that every bar web results is corresponding respectively, and each target entity is carried out corresponding representing with the web results of coupling respectively;
The each target entity represented in the application is result relevant to query string respectively in web results, both corresponding with query string, match with web results again, thus target entity relevant to query string in web results is screened respectively, and compactly, be presented to user intuitively, the correlativity of checking content of pages and the query string that can judge that current web page result is corresponding is clicked without the need to user, and then judge the reliability of the content of pages comprised in current web page result, therefore, the application is that web search results provides more directly perceived and abundant information, more information foundation and facility is provided for user screens web results, improve the efficiency of information inquiry.
Accompanying drawing explanation
Fig. 1 is the process flow diagram of a kind of network search method embodiment of the application;
Fig. 2 is the structural drawing of a kind of web search device of the application embodiment.
Embodiment
For enabling above-mentioned purpose, the feature and advantage of the application more become apparent, below in conjunction with the drawings and specific embodiments, the application is described in further detail.
Existing network search method obtains web results according to web data library searching, and has marked at the web page title of web results and word summary part red font the word matched with the participle in query string, and is presented to user.
But, due to the number of words restriction of summary or the design feature of text, user cannot see all the elements relevant to query string in webpage in web results, also need click to enter corresponding webpage carefully to check, therefore, user needs the link clicking multiple Search Results respectively that it just can be found to want the information of searching for, and reduces the efficiency of information inquiry.
The network search method of the embodiment of the present application is except obtaining except web results corresponding to query string according to web data library searching, also by retrieving the entity result obtained in knowledge base, web results is screened, obtain the target entity that every bar web results is corresponding respectively, and each target entity is carried out corresponding representing with the web results of coupling respectively.
In the art, knowledge base is structuring in knowledge engineering, easy to operate, easy utilization, comprehensive organized knowledge cluster, be for a certain (or the some) needs that field question solves, adopt the knowledge sheet set interknited of certain (or some) knowledge representation modes structured storage, tissue, management and.These knowledge sheets specifically can comprise the knowwhy relevant to field, factual data, the heuristic knowledge obtained by expertise, definition relevant in field as each in certain, theorem and algorithm and common sense knowledge etc.
In the embodiment of the present application, employ the knowledge base of resource description framework (RDF, Resource Description Frameword), wherein, RDF is a data model, is made up of " entity-attribute-value " tlv triple; Entity can regard an object as, and it can be the example noun in each field, as film, TV play, personage, mechanism, place, author, books, publishing house, hotel etc.
The each target entity represented in the application is result relevant to query string respectively in web results, both corresponding with query string, match with web results again, thus target entity relevant to query string in web results is screened respectively, and compactly, be presented to user intuitively, the correlativity of checking content of pages and the query string that can judge that current web page result is corresponding is clicked without the need to user, and then judge the reliability of the content of pages comprised in current web page result, therefore, the application is that web search results provides more directly perceived and abundant information, more information foundation and facility is provided for user screens web results, improve the efficiency of information inquiry.
With reference to Fig. 1, show the process flow diagram of a kind of network search method embodiment of the application, specifically can comprise:
The webpage that step 101, search and the query string inputted match, obtains web results;
The application can be applied to as user provides search service, the information display of user search being correlated with is to the various application scenarioss of user, search engines such as such as Baidu, Google, Yahoo, search dog or there are other scenes of function of search, such as some input frame with other functions also can have function of search concurrently, and therefore the application is not limited concrete search scene.
For search engine, search engine can be applied known technology and obtain content corresponding to query string in search engine server; Such as, by discharging a large amount of capture programs, obtaining the webpage on internet, and between each query string and all relevant webpages, setting up a corresponding relation according to web page correlation principle, being stored in the database of its search engine server.Like this, user in a search engine input inquiry string (such as " Beijing climbed the mountain place to go ") time, just can search for the web results finding and match in search engine server.
In a preferred embodiment of the present application, at the webpage that the query string of described search and input matches, before obtaining the step of web results, described method can also comprise: carry out pretreatment operation to described query string, and described pretreatment operation specifically can comprise error correction, it is one or more to go in word and participle operation; Wherein, error correction, go word, participle operate refer to the word of input error in query string to correct respectively, stop words (such as modal particle, punctuation mark etc.) etc. is deleted, cutting is carried out to query string.
The then webpage that matches of described search and described query string, the step obtaining web results is specifically as follows, and the web document that search and pretreated query string match in the whole network, obtains the web results that described query string is corresponding.
Suppose that carrying out to query string the participle that cutting obtains is term, then in a preferred embodiment of the present application, the webpage that the query string of described search and input matches, the step obtaining web results specifically can comprise: first, each different term is retrieved in webpage falls row storehouse, then network documentation list corresponding for each term is carried out the operation sought common ground, obtain the candidate collection of the webpage containing each term, finally according to set sort method, candidate collection is carried out screening and sequencing, obtain each web results.
Wherein, webpage can obtain in row storehouse in the following way: carry out character analysis to the word in webpage in advance, then set up inverted index to each word, be stored in document data bank; Webpage falls row's library searching advantages such as to have query time short, and efficiency is high, and resource occupation is few.Sort method herein can the relevance parameter between webpage and searching keyword be foundation, and the concrete sort method of the embodiment of the present application to web results is not limited.
Step 102, in knowledge base retrieval obtain entity result corresponding to described query string; Wherein, the entity object in the whole network is stored in described knowledge base;
The principle of knowledge base is introduced above, in specific implementation, can by the analysis to internet web page, extract entity and attribute knowledge thereof, and the mode of adding in knowledge base builds, such as, can extract entity and attribute knowledge thereof from the food of encyclopaedia entry, bean cotyledon film, cuisines outstanding person etc., the embodiment of the present application is not limited the construction method of concrete knowledge base and the specific field that covers.
In a preferred embodiment of the present application, described in knowledge base retrieval obtain the step of entity result corresponding to described query string before, described method can also comprise: carry out pretreatment operation to described query string, and described pretreatment operation specifically can comprise error correction, it is one or more to go in word and participle operation.
Then the described step obtaining entity result corresponding to described query string of retrieving in knowledge base is specifically as follows, and carries out structuralized query, obtain the entity result that described query string is corresponding in knowledge base according to query string after pre-service to the entity object in the whole network.
In a preferred embodiment of the present application, described in knowledge base retrieval obtain the step of entity result corresponding to described query string, specifically can comprise:
Sub-step S101, identify entity word in described query string and entity attribute word, and carry out labeling;
In specific implementation, can by the knowledge base list of entities made in advance, the entity word in query string identified, these entity word comprise the entity instance in every field usually, such as: film, TV play, personage, mechanism, place etc.
Sub-step S102, syntax analysis is carried out to described query string, the syntax analysis result obtained comprise grammar rule and meet described grammar rule, the entity word of labeling;
The syntax can be used for the formal rule representing descriptive language syntactic structure.Syntax analysis herein can be used for carrying out semantic understanding to query string, understands the SVO structure of query string.
In a kind of Application Example of the application, context-free method can be adopted to carry out the syntax analysis of query string.Context-free method is a kind of important transformational grammar in Formal Language Theory, is used for describing context-free language, in Chomsky layering, be called type 2 grammar.Be a set of syntax of oneself definition, can be used for carrying out syntactic analysis, obtain the dependence between sentence structure and each sentence element.
The grammar rule of the Grammars up and down of the embodiment of the present application can be set up in knowledge based storehouse.Such as, " Liu Dehua " is that in knowledge base, classification is the entity object of " people ", " spouse " is the attribute of " people " in knowledge base, what then " spouse " corresponding classification was " people " is entity object, so just can set up grammar rule: < entity _ people >< attribute _ people _ spouse >--->< entity _ people >.
Sub-step S103, be the query statement that machine language describes by described syntax analysis results conversion;
In the embodiment of the present application, described machine language specifically can comprise the various query languages based on resource description framework, as Structured Query Language (SQL) (SQL, Structured Query Language) language, SPARQL (SPARQL Protocol and RDF Query Language) etc.
Sub-step S104, to retrieve in knowledge base according to described query statement and obtain corresponding entity information, as entity result.
For query string " Beijing climbed the mountain place to go ", described in knowledge base retrieval obtain the step of entity result corresponding to described query string, specifically can comprise:
Sub-step S201, identify entity word in described query string and entity attribute word, and carry out labeling, obtain:
Beijing < entity _ city >< entity _ special edition >
Climbed the mountain place to go < attribute _ city _ periphery mountain peak >;
Sub-step S202, set up grammar rule according to knowledge base in advance: < entity _ mountain peak ><-< entity _ city >< attribute _ city _ periphery mountain peak >;
Sub-step S203, according to grammar rule detect the 1st step identification mark whether legal, find that < entity _ special edition > cannot form grammar rule with < attribute _ city _ periphery mountain peak >, therefore abandon, and < entity _ city > and < attribute _ city _ periphery mountain peak > can form grammar rule, retain.
Sub-step S204, obtain the syntax analysis result meeting user's query intention: < entity _ city > is the value of Pekinese < attribute _ city _ periphery mountain peak >;
Sub-step S205, syntax analysis results conversion is become SQL statement:
" SELECT< attribute _ city _ >FROM< entity >=' Beijing, periphery mountain peak ' ";
Sub-step S206, parsing SQL statement, Optimizing Queries logic, the entity and entity attribute information that need to obtain are inquired about in the operation of specifying according to SQL statement from knowledge base, and screen the entity result as correspondence.
Step 103, described entity result carried out respectively in the web page contents that each web results is corresponding analysis coupling, filter out the target entity that every bar web results is corresponding;
Based on the usual One's name is legion of web results matched with query word that search obtains in step 101, concrete whether relevant to query string as the corresponding web page contents of certain web results, in prior art, need user to click to enter this web results to check that web page contents could be determined.
And in the embodiment of the present application, the entity result that step 102 search obtains generally includes at least one entity object relevant to query string in knowledge base, its usually with entry or picture etc. succinctly, intuitively form exist, and the professional knowledge in field can be represented.Described entity result is carried out analysis coupling by the embodiment of the present application respectively in the web page contents that each web results is corresponding, filter out the target entity that every bar web results is corresponding respectively, described target entity stems from web page contents corresponding to each web results, can provide more horn of plenty and intuitively identification information relative to web page title in prior art and word summary part.Each target entity is result relevant to query string respectively in web results, both corresponding with query string, match with web results again, thus target entity relevant to query string in web results is screened respectively, and compactly, be presented to user intuitively, the correlativity of checking content of pages and the query string that can judge that current web page result is corresponding is clicked without the need to user, and then judge the reliability of the content of pages comprised in current web page result, therefore, the application is that web search results provides more directly perceived and abundant information, more information foundation and facility is provided for user screens web results, improve the efficiency of information inquiry.
In a preferred embodiment of the present application, described described entity result is carried out respectively in the web page contents that each web results is corresponding analysis coupling, the step filtering out target entity corresponding to every bar web results is specifically as follows, the object content matched with described entity result is filtered out respectively in the web page contents that every bar web results is corresponding, and using entity result corresponding for described object content as target entity corresponding to described web results.
Suppose that query string is for " Beijing climbed the mountain place to go ", step 102 is retrieved the entity result obtained and specifically can be comprised: " Fragrance Hill ", " Wuling Mountion ", " arrow button Great Wall " three entity objects, clearly, in common prudence, " arrow button Great Wall " is too dangerously steep, for carrying out the tourist attractions of field exploration for Professional climbing personnel, and be not suitable for ordinary populace and carry out exercise of climbing the mountain;
And the matching result of foundation step 103, in web results, entity result in entry 1 Corresponding matching knowledge base is " Fragrance Hill ", the entity result do not mated is " Baihua Shan Mountain ", entity result in entry 2 difference Corresponding matching knowledge base is " Fragrance Hill " and " Wuling Mountion ", the entity result do not mated is " Eight Great Temples of the Western Hills ", therefore, the application filters out the object content matched with described entity result respectively in the web page contents that every bar web results is corresponding, filter out target entity relevant to query string in web results, also be, by " Fragrance Hill " target entity as entry 1 correspondence, by " Fragrance Hill " and " Wuling Mountion " target entity respectively as entry 2 correspondence.
In a preferred embodiment of the present application, described described entity result is carried out respectively in the web page contents that each web results is corresponding analysis coupling, filter out the step of target entity corresponding to every bar web results, specifically can comprise:
Sub-step S301, the Web page text corresponding to described web results are analyzed;
Here Web page text specifically can comprise the web page contents removing the redundant informations such as advertisement.
Sub-step S302, the analysis result that foundation Web page text is corresponding, extract the feature in described Web page text; Described feature specifically can comprise: one or more in title, subtitle, form, summary and overstriking word;
The content of Web page text is usually many, if directly the content of all Web page texts mated with each entity result, then can spend the more processing time; For reducing the processing time, improve treatment effeciency, the feature that this preferred embodiment extracts wherein is mated with each entity result.Certainly, one or more just as the preferred embodiment of feature in title, subtitle, form, summary and overstriking word, in fact further feature content is also feasible, as first section of content, first sentence content etc.
Sub-step S303, the word extracted to be mated with each entity result respectively, obtain the object content matched with each entity result in described web results.
In actual applications, the object content matched with each entity result in described web results can be the rectification of name of entity result or another name, and be the situation of another name for object content, the application can be corrected as rectification of name.
Because the entry of web results is more, not all web results is relevant to query string; And the intellectual being derived from the entity result of knowledge base is comparatively strong, ordinary populace might not be applicable to; Therefore, described described entity result is carried out respectively in the web page contents that each web results is corresponding analysis coupling, the process filtering out target entity corresponding to every bar web results is carry out mutual process of screening to each web results and each entity result, the target entity of its described web results coupling filtered out, not only relevant to query string but also and be arranged in popular web results, therefore it is popular information relevant to query string in web results.
Step 104, by described target entity respectively with coupling web results carry out corresponding representing.
In a preferred embodiment of the present application, described entity information specifically can comprise picture corresponding to entity object and/or entity title;
The step representing the target entity of described web results coupling is then specifically as follows, and represents in the mode of hyperlink in the peripheral region of certain web results to the target entity that this Search Results mates.
Wherein, described peripheral region can be region up and down; User clicks the described target entity represented in the mode of hyperlink, just can enter with the web search results page of the corresponding query string of described target entity.
In a preferred embodiment of the present application, below the title of every bar web results, correspondence can represent entity picture and the entity title of every bar web results; Such user can directly pass through entity picture quick position to oneself interested web results.
Corresponding to above-mentioned example, the application can represent the picture in " Fragrance Hill " respectively below the title of entry 1, and, below the title of entry 2, represent the picture of " Fragrance Hill " and " Wuling Mountion ".Whether the title in picture and web results and summary can contrast by user, relevant to query string to judge the corresponding web page contents of current web page result.
In a preferred embodiment of the present application, before described target entity is carried out the corresponding step represented with the web results of coupling respectively, described method can also comprise: the frequency that the object content matched according to described and described entity result occurs respectively in each described web results and/or position, sorts in units of every bar web results to described target entity;
Then described described target entity is carried out the corresponding step represented be specifically as follows with the web results of coupling respectively, the ranking results according to target entity carries out correspondence to the mark of each target entity that described web results is mated respectively and represents.
In a word, target entity relevant to query string in web results can screen by the application respectively, and be presented to user compactly, intuitively, the correlativity of checking content of pages and the query string that can judge that current web page result is corresponding is clicked without the need to user, and then judge the reliability of the content of pages comprised in current web page result, therefore, the application is that web search results provides more directly perceived and abundant information, provide more information foundation and facility for user screens web results, improve the efficiency of information inquiry.
In one preferred embodiment of the invention, described in the target entity that represents with the hyperlink of correspondence, described hyperlink is respectively used to go to mated web results;
Then described method can also comprise: after receiving the triggering to described target entity, described web results is positioned to the position with described target entity Corresponding matching respectively, thus loads the web page contents of described target entity Corresponding matching.
Corresponding to above-mentioned example, the entity picture in " Fragrance Hill " is presented below the title supposing the entry 1 of the application respectively in web results, and, the entity picture of " Fragrance Hill " and " Wuling Mountion " is presented below the title of entry 2, and the entity picture represented is with the hyperlink of correspondence, described hyperlink is respectively used to go to mated web results;
Then after user clicks " Wuling Mountion " entity picture below entry 2, this preferred embodiment can the Webpage of load entries 2 correspondence, and, after this Webpage is analyzed, the position introducing " Wuling Mountion " corresponding in the Webpage of entry 2 correspondence is determined, and automatically slid into by the slider bar of the browser representing the Webpage of entry 2 correspondence near " Wuling Mountion " position of determining, thus user is facilitated directly to check the content that the target entity of triggering is directly corresponding in this web results.
When described target entity is carried out corresponding representing with the web results of coupling respectively, if user clicks the target entity of certain web results coupling, then illustrate that user is interested in this target entity, and this preferred embodiment loads the webpage of described target entity coupling, and by this homepage finding to the location of content mated with described target entity, can directly for user presents the corresponding content of interested target entity in mated web results, avoid user and again slide the slider bar of browser to search the corresponding content of coupling in the Webpage mated at target entity, thus improve the efficiency of information inquiry further.
In other embodiments, after receiving the triggering to described target entity, also directly can load the physical contents that described target entity is corresponding in knowledge base, thus show the details of the target entity mated with web results, improve the efficiency of information inquiry; Or, in other embodiments, after receiving the triggering to described target entity, also can provide to user the Search Results obtained for query string search with described target entity, etc., in a word, the particular content of the application to the described target entity link represented is not limited.
Corresponding to preceding method embodiment, present invention also provides a kind of web search device, with reference to the structural drawing shown in Fig. 2, specifically can comprise:
Webpage search unit 201, for the webpage that the query string searched for input matches, obtains web results;
Entity search unit 202, obtains entity result corresponding to described query string for retrieval in knowledge base; Wherein, the entity object in the whole network is stored in described knowledge base;
Entity screening unit 203, for described entity result being carried out respectively in the web page contents that each web results is corresponding analysis coupling, filters out the target entity that every bar web results is corresponding; And
Represent unit 204, for described target entity is carried out corresponding representing with the web results of coupling respectively.
In a preferred embodiment of the present application, described entity screening unit 203, can specifically for filtering out the object content matched with described entity result in the web page contents that every bar web results is corresponding respectively, and using entity result corresponding for described object content as target entity corresponding to described web results.
In a preferred embodiment of the present application, described entity search unit 202 specifically can comprise:
Identification label module, for identifying entity word in described query string and entity attribute word, and carries out labeling;
Syntax analysis module, for carrying out syntax analysis to described query string, the syntax analysis result obtained comprise grammar rule and meet described grammar rule, the entity word of labeling;
Modular converter, for by described syntax analysis results conversion being the query statement that machine language describes; And
Machine (information) retrieval module, obtains corresponding entity information, as entity result for retrieving in knowledge base according to described query statement.
In another preferred embodiment of the application, described entity screening unit 203 specifically can comprise:
Web page analysis module, analyzes for the Web page text corresponding to described web results;
Extraction module, for the analysis result that foundation Web page text is corresponding, extracts feature in described Web page text; It is one or more that described feature specifically can comprise in title, subtitle, form, summary and overstriking word; And
Matching module, for being mated with each entity result respectively by the word extracted, obtains the object content matched with each entity result in described web results.
In the embodiment of the present application, preferably, described device can also comprise: for before described target entity is carried out the corresponding operation represented with the web results of coupling respectively, the frequency that the object content matched according to described and described entity result occurs respectively in each described web results and/or position, to the entity sequencing unit that described target entity sorts in units of every bar web results;
Representing unit 204 then can be specifically for, and the ranking results according to target entity carries out correspondence to the mark of each target entity that described web results is mated respectively and represents.
In the embodiment of the present application, preferably, described in the target entity that represents with the hyperlink of correspondence, described hyperlink is respectively used to go to mated web results;
Then described device can also comprise: for after receiving the triggering to described target entity, described web results is positioned to the position with described target entity Corresponding matching respectively, thus loads the loading unit of the web page contents of described target entity Corresponding matching.
In the embodiment of the present application, preferably, described device can also comprise: for carrying out the pretreatment unit of pretreatment operation to described query string, wherein, and described pretreatment operation comprises error correction, it is one or more to go in word and participle operation;
Then described Webpage search unit 201 can be specifically for, and the web document that search and pretreated query string match in the whole network, obtains the web results that described query string is corresponding;
Described entity search unit 202 can be specifically for, carries out structuralized query, obtain the entity result that described query string is corresponding in knowledge base according to query string after pre-service to the entity object in the whole network.
Each embodiment in this instructions all adopts the mode of going forward one by one to describe, and what each embodiment stressed is the difference with other embodiments, between each embodiment identical similar part mutually see.For device embodiment, due to itself and embodiment of the method basic simlarity, so description is fairly simple, relevant part illustrates see the part of embodiment of the method.
A kind of network search method above the application provided and device, be described in detail, apply specific case herein to set forth the principle of the application and embodiment, the explanation of above embodiment is just for helping method and the core concept thereof of understanding the application; Meanwhile, for one of ordinary skill in the art, according to the thought of the application, all will change in specific embodiments and applications, in sum, this description should not be construed as the restriction to the application.

Claims (10)

1. a network search method, is characterized in that, comprising:
The webpage that search matches with the query string of input, obtains web results;
In knowledge base, retrieval obtains entity result corresponding to described query string; Wherein, the entity object in the whole network is stored in described knowledge base;
Described entity result is carried out respectively in the web page contents that each web results is corresponding analysis coupling, filter out the target entity that every bar web results is corresponding;
Described target entity is carried out corresponding representing with the web results of coupling respectively.
2. the method for claim 1, it is characterized in that, the described step filtering out target entity corresponding to every bar web results, comprise: in the web page contents that every bar web results is corresponding, filter out the object content matched with described entity result respectively, and using entity result corresponding for described object content as target entity corresponding to described web results.
3. method as claimed in claim 2, is characterized in that, filter out the step of the object content matched with described entity result respectively, comprising in the described web page contents corresponding in every bar web results:
The Web page text corresponding to described web results is analyzed;
According to the analysis result that Web page text is corresponding, extract feature in described Web page text; It is one or more that described feature comprises in title, subtitle, form, summary and overstriking word;
The word extracted is mated with each entity result respectively, obtains the object content matched with each entity result in described web results.
4. the method as described in claim 1,2 or 3, is characterized in that, described method also comprises:
The frequency that the object content matched according to described and described entity result occurs respectively in each described web results and/or position, sort to described target entity in units of every bar web results;
Then described described target entity is carried out the corresponding step represented with the web results of coupling is respectively, the ranking results according to target entity carries out correspondence to the mark of each target entity that described web results is mated respectively and represents.
5. the method as described in claim 1,2 or 3, is characterized in that, described in the target entity that represents with the hyperlink of correspondence, described hyperlink is respectively used to go to mated web results;
Then described method also comprises:
After receiving the triggering to described target entity, described web results is positioned to the position with described target entity Corresponding matching respectively, thus loads the web page contents of described target entity Corresponding matching.
6. the method for claim 1, is characterized in that, described in knowledge base retrieval obtain the step of entity result corresponding to described query string, comprising:
Identify the entity word in described query string and entity attribute word, and carry out labeling;
Syntax analysis is carried out to described query string, the syntax analysis result obtained comprise grammar rule and meet described grammar rule, the entity word of labeling;
It is the query statement that machine language describes by described syntax analysis results conversion;
Retrieve in knowledge base according to described query statement and obtain corresponding entity information, as entity result.
7. the method for claim 1, is characterized in that, described method also comprises:
Pretreatment operation is carried out to described query string, described pretreatment operation comprises error correction, to go in word and participle operation one or more;
The then webpage that matches of the query string of described search and input, the step obtaining web results is, the web document that search and pretreated query string match in the whole network, obtains the web results that described query string is corresponding;
The described step that retrieval obtains entity result corresponding to described query string in knowledge base is carry out structuralized query according to query string after pre-service to the entity object in the whole network in knowledge base, obtains the entity result that described query string is corresponding.
8. a web search device, is characterized in that, comprising:
Webpage search unit, for the webpage that the query string searched for input matches, obtains web results;
Entity search unit, obtains entity result corresponding to described query string for retrieval in knowledge base; Wherein, the entity object in the whole network is stored in described knowledge base;
Entity screening unit, for described entity result being carried out respectively in the web page contents that each web results is corresponding analysis coupling, filters out the target entity that every bar web results is corresponding; And
Represent unit, for described target entity is carried out corresponding representing with the web results of coupling respectively.
9. device as claimed in claim 8, it is characterized in that, described entity screening unit, specifically for filtering out the object content matched with described entity result in the web page contents that every bar web results is corresponding respectively, and using entity result corresponding for described object content as target entity corresponding to described web results.
10. device as claimed in claim 9, is characterized in that, described entity screening unit comprises:
Web page analysis module, analyzes for the Web page text corresponding to described web results;
Extraction module, for the analysis result that foundation Web page text is corresponding, extracts feature in described Web page text; It is one or more that described feature comprises in title, subtitle and overstriking word; And
Matching module, for being mated with each entity result respectively by the word extracted, obtains the object content matched with each entity result in described web results.
CN201310633696.7A 2013-11-29 2013-11-29 A kind of network search method and device Active CN104679783B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201310633696.7A CN104679783B (en) 2013-11-29 2013-11-29 A kind of network search method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201310633696.7A CN104679783B (en) 2013-11-29 2013-11-29 A kind of network search method and device

Publications (2)

Publication Number Publication Date
CN104679783A true CN104679783A (en) 2015-06-03
CN104679783B CN104679783B (en) 2019-08-02

Family

ID=53314838

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201310633696.7A Active CN104679783B (en) 2013-11-29 2013-11-29 A kind of network search method and device

Country Status (1)

Country Link
CN (1) CN104679783B (en)

Cited By (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106168962A (en) * 2016-06-30 2016-11-30 北京奇虎科技有限公司 Searching method and the device of accurate viewpoint are provided based on natural Search Results
CN106202286A (en) * 2016-06-30 2016-12-07 北京奇虎科技有限公司 Searching method and the device of entity word are provided based on natural Search Results
CN106777173A (en) * 2016-12-21 2017-05-31 广州阿里巴巴文学信息技术有限公司 Information flow methods of exhibiting, system and user terminal
CN107133321A (en) * 2017-05-04 2017-09-05 广东神马搜索科技有限公司 The analysis method and analytical equipment of the search attribute of the page
CN107229741A (en) * 2017-06-20 2017-10-03 百度在线网络技术(北京)有限公司 Information search method, device, equipment and storage medium
CN107704450A (en) * 2017-10-13 2018-02-16 威盛电子股份有限公司 Natural language recognition equipment and natural language recognition method
CN110245197A (en) * 2019-05-20 2019-09-17 北京百度网讯科技有限公司 A kind of the whole network entity associated method and system
CN110569335A (en) * 2018-03-23 2019-12-13 百度在线网络技术(北京)有限公司 triple verification method and device based on artificial intelligence and storage medium
CN111666479A (en) * 2019-03-06 2020-09-15 富士通株式会社 Method for searching web page and computer readable storage medium
CN114741627A (en) * 2022-04-12 2022-07-12 中国人民解放军32802部队 Internet-oriented auxiliary information searching method

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1335574A (en) * 2001-09-05 2002-02-13 罗笑南 Intelligent semantic searching method
CN1395206A (en) * 2002-08-23 2003-02-05 北京大学 Method for collecting, analyzing and providing network information and its system
US20080256481A1 (en) * 2000-10-11 2008-10-16 Microsoft Corporation Browser navigation for devices with a limited input system
CN101876981A (en) * 2009-04-29 2010-11-03 阿里巴巴集团控股有限公司 Method and device for establishing knowledge base
CN102314435A (en) * 2010-06-30 2012-01-11 腾讯科技(深圳)有限公司 Method for searching webpage content and system
CN102722483A (en) * 2011-03-29 2012-10-10 百度在线网络技术(北京)有限公司 Method, apparatus and equipment for determining candidate-item sequence of input method

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20080256481A1 (en) * 2000-10-11 2008-10-16 Microsoft Corporation Browser navigation for devices with a limited input system
CN1335574A (en) * 2001-09-05 2002-02-13 罗笑南 Intelligent semantic searching method
CN1395206A (en) * 2002-08-23 2003-02-05 北京大学 Method for collecting, analyzing and providing network information and its system
CN101876981A (en) * 2009-04-29 2010-11-03 阿里巴巴集团控股有限公司 Method and device for establishing knowledge base
CN102314435A (en) * 2010-06-30 2012-01-11 腾讯科技(深圳)有限公司 Method for searching webpage content and system
CN102722483A (en) * 2011-03-29 2012-10-10 百度在线网络技术(北京)有限公司 Method, apparatus and equipment for determining candidate-item sequence of input method

Cited By (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106202286B (en) * 2016-06-30 2019-11-01 北京奇虎科技有限公司 The searching method and device of entity word are provided based on natural search result
CN106202286A (en) * 2016-06-30 2016-12-07 北京奇虎科技有限公司 Searching method and the device of entity word are provided based on natural Search Results
CN106168962A (en) * 2016-06-30 2016-11-30 北京奇虎科技有限公司 Searching method and the device of accurate viewpoint are provided based on natural Search Results
CN106777173A (en) * 2016-12-21 2017-05-31 广州阿里巴巴文学信息技术有限公司 Information flow methods of exhibiting, system and user terminal
CN107133321B (en) * 2017-05-04 2020-06-12 广东神马搜索科技有限公司 Method and device for analyzing search characteristics of page
CN107133321A (en) * 2017-05-04 2017-09-05 广东神马搜索科技有限公司 The analysis method and analytical equipment of the search attribute of the page
CN107229741A (en) * 2017-06-20 2017-10-03 百度在线网络技术(北京)有限公司 Information search method, device, equipment and storage medium
CN107704450A (en) * 2017-10-13 2018-02-16 威盛电子股份有限公司 Natural language recognition equipment and natural language recognition method
CN107704450B (en) * 2017-10-13 2020-12-04 威盛电子股份有限公司 Natural language identification device and natural language identification method
CN110569335A (en) * 2018-03-23 2019-12-13 百度在线网络技术(北京)有限公司 triple verification method and device based on artificial intelligence and storage medium
US11275810B2 (en) 2018-03-23 2022-03-15 Baidu Online Network Technology (Beijing) Co., Ltd. Artificial intelligence-based triple checking method and apparatus, device and storage medium
CN111666479A (en) * 2019-03-06 2020-09-15 富士通株式会社 Method for searching web page and computer readable storage medium
CN110245197A (en) * 2019-05-20 2019-09-17 北京百度网讯科技有限公司 A kind of the whole network entity associated method and system
CN114741627A (en) * 2022-04-12 2022-07-12 中国人民解放军32802部队 Internet-oriented auxiliary information searching method
CN114741627B (en) * 2022-04-12 2023-03-24 中国人民解放军32802部队 Internet-oriented auxiliary information searching method

Also Published As

Publication number Publication date
CN104679783B (en) 2019-08-02

Similar Documents

Publication Publication Date Title
CN104679783A (en) Network searching method and device
US11120059B2 (en) Conversational query answering system
CN101364239B (en) Method for auto constructing classified catalogue and relevant system
CN100476830C (en) Network resource searching method and system
KR101040119B1 (en) Apparatus and Method for Search of Contents
KR101646754B1 (en) Apparatus and Method of Mobile Semantic Search
KR101060594B1 (en) Keyword Extraction and Association Network Configuration for Document Data
CN102968465B (en) Network information service platform and the search service method based on this platform thereof
US20090327338A1 (en) Hierarchy extraction from the websites
CN102622453A (en) Body-based food security event semantic retrieval system
KR101285721B1 (en) System and method for generating content tag with web mining
CN102609427A (en) Public opinion vertical search analysis system and method
CN101482875A (en) Information query method and apparatus
CN102236696A (en) Scalable incremental semantic entity and relatedness extraction from unstructured text
US10621252B2 (en) Method for searching in a database
WO2008130501A1 (en) Unstructured and semistructured document processing and searching and generation of value-based information
CN116226494B (en) Crawler system and method for information search
WO2012091541A1 (en) A semantic web constructor system and a method thereof
CN112597370A (en) Webpage information autonomous collecting and screening system with specified demand range
CN109948015B (en) Meta search list result extraction method and system
KR101037091B1 (en) Ontology Based Semantic Search System and Method for Authority Heading of Various Languages via Automatic Language Translation
Khurana et al. Survey of techniques for deep web source selection and surfacing the hidden web content
TWI423053B (en) Domain Interpretation Data Retrieval Method and Its System
KR101506443B1 (en) Diagnosis system for search engine optimization
Pani et al. An approach to manage the web knowledge

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant