CN104077388A - Summary information extraction method and device based on search engine and search engine - Google Patents

Summary information extraction method and device based on search engine and search engine Download PDF

Info

Publication number
CN104077388A
CN104077388A CN201410302674.7A CN201410302674A CN104077388A CN 104077388 A CN104077388 A CN 104077388A CN 201410302674 A CN201410302674 A CN 201410302674A CN 104077388 A CN104077388 A CN 104077388A
Authority
CN
China
Prior art keywords
page
summary info
web page
resources
search engine
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201410302674.7A
Other languages
Chinese (zh)
Inventor
董毅
张前川
陈营营
张川
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Qihoo Technology Co Ltd
Qizhi Software Beijing Co Ltd
Original Assignee
Beijing Qihoo Technology Co Ltd
Qizhi Software Beijing Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Qihoo Technology Co Ltd, Qizhi Software Beijing Co Ltd filed Critical Beijing Qihoo Technology Co Ltd
Priority to CN201410302674.7A priority Critical patent/CN104077388A/en
Publication of CN104077388A publication Critical patent/CN104077388A/en
Priority to PCT/CN2015/080676 priority patent/WO2015196910A1/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/34Browsing; Visualisation therefor
    • G06F16/345Summarisation for human users
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/951Indexing; Web crawling techniques

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Transfer Between Computers (AREA)

Abstract

The invention discloses a summary information extraction method and device based on a search engine and the search engine. The method comprises: obtaining matched webpage resources based on a search character string received by the search engine; identifying the webpage types of the webpage resources; extracting corresponding summary information from the webpage resources aiming at the webpage types; and outputting the summary information. According to the summary information extraction method and device based on the search engine, the situation that a user may frequently click pages corresponding to search results to find required information can be reduced, and thereby retrieval speed is improved, interaction times of the search engine are reduced, and date processing speed is enhanced.

Description

Summary info extracting method, device and search engine based on search engine
Technical field
The present invention relates to the technical field of information retrieval, be specifically related to a kind of summary info extracting method based on search engine, a kind of device and a kind of search engine of the summary info extracting method based on search engine.
Background technology
In the very big abundant current era of the network information, search engine has become the indispensable instrument of user to magnanimity resource retrieval.
The effect of showing in order to strengthen Search Results, in the Search Results that search engine provides, except web page title and URL, can also comprise and provide one section from the summary of webpage.At present, search engine generates the mode of summary, can be summed up as following two kinds:
The one, static mode, is independent of inquiry, according to certain rule, at pretreatment stage, from web page contents, extract some words in advance, 512 bytes of beginning of intercepting page text (corresponding 256 Chinese characters) for example, or first sentence of each paragraph is spelled, etc.The summary forming like this leaves in Query Subsystem, once relevant documentation is selected, mates with query term, just reads and returns to user.Obviously, this mode is the most easily to Query Subsystem, does not need to do other work for the treatment of.But a maximum shortcoming of this mode is that summary and inquiry are irrelevant.
User wishes can highlight and inquire about the word of direct correspondence in summary, wishes the relevant sentence of word that in summary, appearance is concerned about with him.Therefore, dynamic abstract mode is arisen at the historic moment, and dynamic abstract is in response inquiry, and the position according to query word in document, extracts word around and come, when showing, query word mark is bright.This is the mode that most search engine adopts.
Although the query word that the content of dynamic abstract comprises user, these sentences can not give expression to the meaning of whole Web document.That is to say, the summary that user returns by reading search engine can not determine whether the information of oneself searching is included in this page.Now, user need to click Search Results, from webpage corresponding to Search Results, checks whether comprise the information of oneself wanting, and reciprocal process repeatedly expends bandwidth resources, and search efficiency is low.
Summary of the invention
In view of the above problems, the present invention has been proposed to provide a kind of a kind of summary info extracting method based on search engine that overcomes the problems referred to above or address the above problem at least in part and corresponding a kind of summary info extracting method and a kind of search engine based on search engine.
According to one aspect of the present invention, a kind of summary info extracting method based on search engine is provided, comprising:
Search string based on receiving in search engine, obtains the web page resources of coupling;
Identify the page type of described web page resources;
For described page type, from described web page resources, extract corresponding summary info;
Export described summary info.
Alternatively, the step of the page type of the described web page resources of described identification comprises:
Extract the page framework of described web page resources, calculate page framework ID;
If the quantity of the page framework of same page framework ID is greater than predetermined threshold value, calculate page framework mode;
Described page framework mode is mated with the page framework mode in the database generating in advance, identify page type.
Alternatively, described web page resources comprises webpage source code, and described page type comprises single page, and described for described page type, the step of extracting corresponding summary info from described web page resources comprises:
For described single page, from described webpage source code, extract the element information of one or more keys, as summary info.
Alternatively, described single page comprises following one or more combination: download text page, audio frequency and video are play the page, novel reading page, the question and answer page, newsgroup's diagram page, the thematic page.
Alternatively, described web page resources comprises webpage source code, and described page type comprises original list, and described for described page type, the step of extracting corresponding summary info from described web page resources comprises:
For described original list, from described webpage source code, extract the preceding one or more element informations of clicking rate sequence that described web page resources counts, as summary info.
Alternatively, described original list comprises audio frequency and video original list.
Alternatively, described for described page type, the step of extracting corresponding summary info from described web page resources comprises:
For described page type, the website object corresponding to described web page resources sends the first inquiry request;
Receive the history access record corresponding with described the first inquiry request that described website object sends, described history access record is that described website object obtains after cookies information from current terminal, according to the record of described cookies information acquisition;
From described history access record, obtain the element information that access times in described web page resources are greater than first threshold, as summary info.
Alternatively, described for described page type, the step of extracting corresponding summary info from described web page resources comprises:
For described page type, to the browser of current terminal, send the second inquiry request, described the second inquiry request comprises the website object identity of described web page resources;
Receive the history access record relevant to described website object identity in the current terminal that described browser returns, the browser that described history access record is current terminal obtains after obtaining the cookies information relevant to described website object;
From described history access record, obtain the element information that access times in described web page resources are greater than first threshold, as summary info.
Alternatively, described method also comprises:
Described summary info is added to specific markers TAG.
Alternatively, described for described page type, the step of extracting corresponding summary info from described web page resources is:
For described page type, from the summary database generating in advance, search the summary info corresponding with described web page resources, described summary database stores web page resources and corresponding summary info.
Alternatively, described summary info at least comprises the combination of one or more as follows: the element URL of one or more element informations, component identification, element picture, element text description information.
According to a further aspect in the invention, provide a kind of summary info extraction element based on search engine, having comprised:
Web page resources acquisition module, is suitable for the search string based on receiving in search engine, obtains the web page resources of coupling;
Page type identification module, is suitable for identifying the page type of described web page resources;
Summary info extraction module, is suitable for for described page type, extracts corresponding summary info from described web page resources;
Message output module, is suitable for exporting described summary info.
Alternatively, described page type identification module is also suitable for:
Extract the page framework of described web page resources, calculate page framework ID;
If the quantity of the page framework of same page framework ID is greater than predetermined threshold value, calculate page framework mode;
Described page framework mode is mated with the page framework mode in the database generating in advance, identify page type.
Alternatively, described web page resources comprises webpage source code, and described page type comprises single page, and described summary info extraction module is also suitable for:
For described single page, from described webpage source code, extract the element information of one or more keys, as summary info.
Alternatively, described single page comprises following one or more combination: download text page, audio frequency and video are play the page, novel reading page, the question and answer page, newsgroup's diagram page, the thematic page.
Alternatively, described web page resources comprises webpage source code, and described page type comprises original list, and described summary info extraction module is also suitable for:
For described original list, from described webpage source code, extract the preceding one or more element informations of clicking rate sequence that described web page resources counts, as summary info.
Alternatively, described original list comprises audio frequency and video original list.
Alternatively, described summary info extraction module is also suitable for:
For described page type, the website object corresponding to described web page resources sends the first inquiry request;
Receive the history access record corresponding with described the first inquiry request that described website object sends, described history access record is that described website object obtains after cookies information from current terminal, according to the record of described cookies information acquisition;
From described history access record, obtain the element information that access times in described web page resources are greater than first threshold, as summary info.
Alternatively, described summary info extraction module is also suitable for:
For described page type, to the browser of current terminal, send the second inquiry request, described the second inquiry request comprises the website object identity of described web page resources;
Receive the history access record relevant to described website object identity in the current terminal that described browser returns, the browser that described history access record is current terminal obtains after obtaining the cookies information relevant to described website object;
From described history access record, obtain the element information that access times in described web page resources are greater than first threshold, as summary info.
Alternatively, described device also comprises:
Mark adds module, is suitable for described summary info to add specific markers TAG.
Alternatively, described summary info extraction module is also suitable for:
For described page type, from the summary database generating in advance, search the summary info corresponding with described web page resources, described summary database stores web page resources and corresponding summary info.
Alternatively, described summary info at least comprises the combination of one or more as follows: the element URL of one or more element informations, component identification, element picture, element text description information.
According to a further aspect in the invention, provide a kind of search engine, having comprised:
Web page resources acquisition module, is suitable for the search string based on receiving, and obtains the web page resources of coupling;
Page type identification module, is suitable for identifying the page type of described web page resources;
Summary info extraction module, is suitable for for described page type, extracts corresponding summary info from described web page resources;
Message output module, is suitable for exporting described summary info.
In embodiments of the present invention, search engine receives after the search string of user's input, search all web page resources that comprise search string as the web page resources of coupling, the summary info of exporting in Search Results, for by identifying after the page type of described web page resources, obtains the web page resources extraction of different page types.Thereby making to be presented at summary info in Search Results, to express the accuracy of meaning of full page document higher, the information that offers user is more valuable, user just can obtain the information of wanting from summary info, reduce user and because frequently clicking the page corresponding to Search Results, searched the situation generation of information needed, and then improved retrieval rate, reduce the interaction times of search engine, improved data processing rate.
In addition, in embodiments of the present invention, obtain after the web page resources of coupling, according to web page resources, obtain corresponding cookies information, and according to cookies information acquisition user's history access record, from described history access record, obtain the element information that access times in described web page resources are greater than first threshold, as summary info.Thereby the summary info that makes to be presented in Search Results is the personalized summary information for different user, when promoting user's experience, the information that makes to offer in summary info user is more valuable, user just can obtain the information of wanting from summary info, reduce user and because frequently clicking the page corresponding to Search Results, searched the situation generation of information needed, and then improved retrieval rate, and reduced the interaction times of search engine, improve data processing rate.
Above-mentioned explanation is only the general introduction of technical solution of the present invention, in order to better understand technological means of the present invention, and can be implemented according to the content of instructions, and for above and other objects of the present invention, feature and advantage can be become apparent, below especially exemplified by the specific embodiment of the present invention.
Accompanying drawing explanation
By reading below detailed description of the preferred embodiment, various other advantage and benefits will become cheer and bright for those of ordinary skills.Accompanying drawing is only for the object of preferred implementation is shown, and do not think limitation of the present invention.And in whole accompanying drawing, by identical reference symbol, represent identical parts.In the accompanying drawings:
Fig. 1 shows the flow chart of steps of a kind of according to an embodiment of the invention summary info extracting method embodiment mono-based on search engine;
Fig. 2 shows the flow chart of steps of a kind of according to an embodiment of the invention summary info extracting method embodiment bis-based on search engine;
Fig. 2-a shows the download text page schematic diagram of a kind of according to an embodiment of the invention summary info extracting method embodiment bis-based on search engine;
Fig. 2-b shows the first Output rusults schematic diagram of a kind of according to an embodiment of the invention summary info extracting method embodiment bis-based on search engine;
Fig. 3 shows the flow chart of steps of a kind of according to an embodiment of the invention summary info extracting method embodiment tri-based on search engine;
Fig. 3-a shows the video website homepage schematic diagram of a kind of according to an embodiment of the invention summary info extracting method embodiment tri-based on search engine;
Fig. 3-b shows the second Output rusults schematic diagram of a kind of according to an embodiment of the invention summary info extracting method embodiment tri-based on search engine;
Fig. 4 shows the flow chart of steps of a kind of according to an embodiment of the invention summary info extracting method embodiment tetra-based on search engine;
Fig. 4-a shows the video website homepage schematic diagram of a kind of according to an embodiment of the invention summary info extracting method embodiment tetra-based on search engine;
Fig. 4-b shows the 3rd Output rusults schematic diagram of a kind of according to an embodiment of the invention summary info extracting method embodiment tetra-based on search engine;
Fig. 5 shows the flow chart of steps of a kind of according to an embodiment of the invention summary info extracting method embodiment five based on search engine;
Fig. 6 shows the structured flowchart of a kind of according to an embodiment of the invention summary info extraction element embodiment based on search engine;
Fig. 7 shows the structured flowchart of a kind of according to an embodiment of the invention search engine embodiment.
Embodiment
Exemplary embodiment of the present disclosure is described below with reference to accompanying drawings in more detail.Although shown exemplary embodiment of the present disclosure in accompanying drawing, yet should be appreciated that and can realize the disclosure and the embodiment that should do not set forth limits here with various forms.On the contrary, it is in order more thoroughly to understand the disclosure that these embodiment are provided, and can by the scope of the present disclosure complete convey to those skilled in the art.
With reference to Fig. 1, show the flow chart of steps of a kind of according to an embodiment of the invention summary info extracting method embodiment mono-based on search engine, the embodiment of the present invention can comprise the steps:
Step 101, the search string based on receiving in search engine, obtains the web page resources of coupling;
Step 102, identifies the page type of described web page resources;
Step 103 for described page type, is extracted corresponding summary info from described web page resources;
Step 104, exports described summary info.
In embodiments of the present invention, search engine receives after the search string of user's input, search all web page resources that comprise search string as the web page resources of coupling, the summary info of exporting in Search Results, for by identifying after the page type of described web page resources, obtains the web page resources extraction of different page types.Thereby making to be presented at summary info in Search Results, to express the accuracy of meaning of full page document higher, the information that offers user is more valuable, user just can obtain the information of wanting from summary info, reduce user and because frequently clicking the page corresponding to Search Results, searched the situation generation of information needed, and then improved retrieval rate, reduce the interaction times of search engine, improved data processing rate.
With reference to Fig. 2, show the flow chart of steps of a kind of according to an embodiment of the invention summary info extracting method embodiment bis-based on search engine, the embodiment of the present invention can comprise the steps:
Step 201, the search string based on receiving in search engine, obtains the web page resources of coupling, and described web page resources comprises webpage source code;
Search string query is the search information that user inputs in search engine interface, in order to express user view, and the web page resources that request search is associated.
Search engine receives after the search string of user's input, search string is carried out participle, removes stop-word, after wrongly written or mispronounced characters judgement etc. processes, searches all web page resources that comprise search string as the web page resources of coupling from the index data base of setting up in advance.Wherein, web page resources can comprise the information such as link of the URL address of Web page text, webpage, the webpage source code that forms webpage and turnover webpage.
Step 202, identifies the page type of described web page resources, and described page type comprises single page;
Obtain after web page resources, can further according to this web page resources, identify corresponding page type, in a preferred embodiment of the present invention, described step 202 can comprise following sub-step:
Sub-step S11, extracts the page framework of described web page resources, calculates page framework ID;
In specific implementation, the mode that extracts the page framework of web page resources can be: the page framework that extracts webpage according to the html linguistic labels in webpage source code, during extraction, only retain the frame clsss mark in html linguistic labels, as frame, table etc., retain id, name, class attribute simultaneously, remove all the other attributes.Can also identify Web page text by punctuate, remove text to obtain the page framework of webpage.
Extract after page framework, attribute in the page can be calculated to the hash value of page framework according to hash algorithm, be page framework ID, for example, frame clsss mark is calculated by hash algorithm as frame, table and id, name, class attribute, and acquired results is page framework ID.Owing to adopting identical hash function, the page framework ID that identical page framework calculates is also identical.
Sub-step S12, if the quantity of the page framework of same page framework ID is greater than predetermined threshold value, calculates page framework mode;
In practice, while calculating page framework mode, part of title, time, Web page text philosophy calculate, and computing method can adopt machine automatic learning mechanism, as adopted support vector machines (Support Vector Machine) to calculate page framework mode.During study, the page framework input SVM of above-mentioned extraction is learnt, page framework is carried out to the coupling that html linguistic labels closes key label, html linguistic labels in the page framework of some identical ID closes key label and can mate completely, therefore, for the page framework of identical ID learn above-mentioned predetermined threshold value quantity after, SVM just exports the page framework mode of respective page framework.
Sub-step S13, mates described page framework mode with the page framework mode in the database generating in advance, identify page type.
Wherein, in the database generating in advance, store the weight of each web page characteristics under known type page framework mode and this pattern, to the feature matching, according to different classifications, be that page framework increases respective weights, if the weight of the corresponding page is the highest, this page is corresponding page type.
Page type in the embodiment of the present invention can comprise single page, and/or original list.Wherein, described single page is the page that page elements is more single, can comprise following one or more combination: download text page, audio frequency and video are play the page, novel reading page, the question and answer page, newsgroup's diagram page, the thematic page.The described page table page can comprise audio frequency and video original list.
Step 203 for described single page, is extracted the element information of one or more keys, as summary info from described webpage source code;
Wherein, summary info at least can comprise the combination of one or more as follows: the element URL of one or more element informations, component identification, element picture, element text description information.
In specific implementation, if the page type of the web page resources mating with search string is single page, can extract according to the content in the html linguistic labels in webpage source code the element information of one or more keys, and html linguistic labels can comprise <a> label (definition hyperlink, the target of its attribute href attribute indication link), <meta> label (can provide the metamessage (meta-information) of the relevant page, such as the description for search engine and update frequency and keyword), <span> label (bind lines interior element), <div> label, <p> label, <script> label, <classs> label etc.For example, download text page for one, can from following code, obtain corresponding element information as summary info:
<div?class="toolBottom">
<div?class="txtLogo"></div>
<p class=" toolInfo " >56.6M| update date 2014/01/03</p>
<p class=" roundIcon " ><a href=" intro.shtml " target=" _ blank " class=" link " title=" function animated show " > function animated show </a></pGreatT.Gr eaT.GT
<a href=" http://dldir1.XX.com/XXfile/XX/XX2013/XX2013SP6/9305/XX2013SP6. exe " class=" downBtn " title=" immediately download " onclick=" tcssClick & & tcssClick (' downXX') " > downloads </a> immediately
</div>
Wherein, XX is corresponding downloaded object, and corresponding element information or summary info is: 56.6M| update date 2014/01/03; Download address is: http://dldir1.XX.com/XXfile/XX/XX2013/XX2013SP6/9305/XX2013SP6. ex e.
Step 204, exports described summary info.
Obtain after the summary info that web page resources is corresponding, can when Search Results is exported, in the default position of corresponding Search Results, export summary info.
For example, download text page schematic diagram as shown in Fig. 2-a, download in text page 200, there is downloaded object sign 210, the information such as downloaded object describes 220, download address 1230 and download address 2240, wherein, downloaded object sign can be for the formal version of XX software etc., and downloaded object is described can comprise the information such as software size, update time, software language, provider, soft ware authorization, software grading, application platform, software function brief introduction.At this, download in text page 200, by householder's demand, it is download address, so can the download address link in the page be extracted by step 203, be presented in the summary info of Search Results, user directly just can obtain the download that download address is carried out downloaded object from summary info like this, without the page that enters this Search Results place, search download address, the summary info of output is as shown in the first Output rusults schematic diagram of Fig. 2-b.
In embodiments of the present invention, search engine receives after the search string of user's input, search all web page resources that comprise search string as the web page resources of coupling, identify after the page type of described web page resources, for the web page resources of single page, from source code, extract corresponding summary info.Thereby making to be presented at summary info in Search Results, to express the accuracy of meaning of full page document higher, the information that offers user is more valuable, user just can obtain the information of wanting from summary info, reduce user and because frequently clicking the page corresponding to Search Results, searched the situation generation of information needed, and then improved retrieval rate, reduce the interaction times of search engine, improved data processing rate.
With reference to Fig. 3, show the flow chart of steps of a kind of according to an embodiment of the invention summary info extracting method embodiment tri-based on search engine, the embodiment of the present invention can comprise the steps:
Step 301, the search string based on receiving in search engine, obtains the web page resources of coupling, and described web page resources comprises webpage source code;
Step 302, identifies the page type of described web page resources, and described page type comprises original list;
In a preferred embodiment of the present invention, described step 302 can comprise following sub-step:
Sub-step S21, extracts the page framework of described web page resources, calculates page framework ID;
Sub-step S22, if the quantity of the page framework of same page framework ID is greater than predetermined threshold value, calculates page framework mode;
Sub-step S23, mates described page framework mode with the page framework mode in the database generating in advance, identify page type.
Page type in the embodiment of the present invention can comprise single page, and/or original list.Wherein, described original list is the many pages of page elements, can comprise the original lists such as audio frequency and video homepage.
Step 303 for described original list, is extracted the preceding one or more element informations of clicking rate sequence that described web page resources counts, as summary info from described webpage source code;
Wherein, summary info at least can comprise the combination of one or more as follows: the element URL of one or more element informations, component identification, element picture, element text description information.
In specific implementation, if the page type of the web page resources mating with search string is original list, can obtain according to the content in the html linguistic labels in webpage source code the clicking rate data (as video ranking list etc.) that webpage counts, then from the preceding element information of the one or more sequences of clicking rate extracting data as summary info, and html linguistic labels can comprise <a> label (definition hyperlink, the target of its attribute href attribute indication link), <meta> label (can provide the metamessage (meta-information) of the relevant page, such as the description for search engine and update frequency and keyword), <span> label (bind lines interior element), <div> label, <p> label, <script> label, <classs> label etc.For example, for the video website homepage page, can from following code, obtain corresponding element information as summary info:
<div?class="item">
<label?class="hot">1</label>
The sharp XX DVD of <a class=" name " target=" _ blank " href=" http://v.youku.com/v_show/id_XNzIxNzc0NTUy.html " data-from=" 1-1 " > version </a>
</div>
In summary info, show that the element information making number one is sharp XX DVD version.In practice, each element information at least can comprise one or more in following attribute: element URL, component identification, element picture, element text description information.Therefore,, for upper example, in summary info, can provide the information such as broadcasting URL, title, picture of sharp XX DVD version.
Step 304, exports described summary info.
It should be noted that, when output summary info, described one or more element informations can be illustrated in Search Results with the form with carousel.
For example, video website homepage schematic diagram as shown in Fig. 3-a, in video website homepage 300, can comprise video category list 310, each video class object video and the corresponding information such as ranking list (as classification 1 ranking list 320), wherein, video category list can comprise TV play, film, variety, music, animation, tourism etc., if classification 1330 is TV play, video A is each TV play program to video F, and classification 1 ranking list can be for being video A, video B, video D, video F etc. in turn.Can be by step 303, the n before ranking list of each classification program in this video website 300 is individual (as first 2, concrete number can be set as required, the embodiment of the present invention to this without being limited) video is presented in summary, as shown in the second Output rusults schematic diagram of Fig. 3-b, wherein be illustrated in title that video A, video B etc. in summary info can comprise corresponding video, play URL, picture and/or, text description etc.
In embodiments of the present invention, search engine receives after the search string of user's input, search all web page resources that comprise search string as the web page resources of coupling, identify after the page type of described web page resources, for the web page resources of original list, from source code, extract corresponding summary info.Thereby making to be presented at summary info in Search Results, to express the accuracy of meaning of full page document higher, the information that offers user is more valuable, user just can obtain the information of wanting from summary info, reduce user and because frequently clicking the page corresponding to Search Results, searched the situation generation of information needed, and then improved retrieval rate, reduce the interaction times of search engine, improved data processing rate.
With reference to Fig. 4, show the flow chart of steps of a kind of according to an embodiment of the invention summary info extracting method embodiment tetra-based on search engine, the embodiment of the present invention can comprise the steps:
Step 401, the search string based on receiving in search engine, obtains the web page resources of coupling;
Step 402, identifies the page type of described web page resources;
In a preferred embodiment of the present invention, described step 402 can comprise following sub-step:
Sub-step S31, extracts the page framework of described web page resources, calculates page framework ID;
Sub-step S32, if the quantity of the page framework of same page framework ID is greater than predetermined threshold value, calculates page framework mode;
Sub-step S33, mates described page framework mode with the page framework mode in the database generating in advance, identify page type.
Step 403 for described page type, is extracted corresponding summary info from described web page resources;
The embodiment of the present invention can be according to user the history access record of web page resources to coupling, in summary info, represent the element information relevant to history access record, be specifically as follows:
In a preferred embodiment of the present invention, step 403 can comprise following sub-step:
Sub-step S41, for described page type, the website object corresponding to described web page resources sends the first inquiry request;
Sub-step S42, receive the history access record corresponding with described the first inquiry request that described website object sends, described history access record is that described website object obtains after cookies information from current terminal, according to the record of described cookies information acquisition;
Sub-step S43 obtains the element information that access times in described web page resources are greater than first threshold, as summary info from described history access record.
Concrete, if the web page resources mating with search string query belongs to some websites object, search engine can send the first inquiry request to this website object, described the first inquiry request is the request of informing that this website object has user to inquire about.Website object receives after the first inquiry request, from current terminal, obtain corresponding cookies information, and according to this cookies information acquisition active user's history access record, feed back to search engine, search engine is according to the history access record receiving, obtain element information that in described web page resources, access times are greater than first threshold as summary info, thereby provide personalized summary info for user.Wherein, first threshold can be 1 or other round valuess, the embodiment of the present invention to this without being limited.
In another kind of preferred embodiment of the present invention, step 403 can comprise following sub-step:
Sub-step S51, for described page type, sends the second inquiry request to the browser of current terminal, and described the second inquiry request comprises the website object identity of described web page resources;
Sub-step S52, receives the history access record relevant to described website object identity in the current terminal that described browser returns, and the browser that described history access record is current terminal obtains after obtaining the cookies information relevant to described website object;
Sub-step S53 obtains the element information that access times in described web page resources are greater than first threshold, as summary info from described history access record.
Concrete, if the web page resources mating with search string query belongs to some websites object, search engine can send to the browser of current terminal the second inquiry request, to ask the browser calling and obtaining user of current terminal to access the cookies information of this website object.The browser of current terminal receives after the second inquiry request, from current terminal, obtain the cookies information corresponding with the sign of website object, and according to this cookies information acquisition active user's history access record, feed back to search engine, search engine is according to the history access record receiving, obtain element information that in described web page resources, access times are greater than first threshold as summary info, thereby provide personalized summary info for user.
Step 404, adds specific markers TAG to described summary info;
In embodiments of the present invention, according to user's history access record, extract after personalized summary info, can also add specific markers TAG to this personalized summary info, as stamped recommendation mark for this personalized summary info.
Step 405, exports the described summary info that adds specific markers TAG.
In specific implementation, summary info at least comprises the combination of one or more as follows: the element URL of one or more element informations, component identification, element picture, element text description information.
For example, video website homepage schematic diagram as shown in Fig. 4-a, in video website homepage 400, can comprise video category list 410, each video class object video and the corresponding information such as ranking list (as classification 1 ranking list 420), wherein, video category list can comprise TV play, film, variety, music, animation, tourism etc., if classification 1430 is TV play, video A is each TV play program to video F, and classification 1 ranking list can be for being video A, video B, video D, video F etc. in turn.By step 403, can obtain the history access record of user to this video website 400, the video that user checked this video website as obtained has video E, video F, the video of user being checked is stamped marks such as " excellent ", and (concrete tag content can be set as required, the embodiment of the present invention to this without being limited), be presented in summary, as shown in the 3rd Output rusults schematic diagram of Fig. 4-b.Wherein be illustrated in title that video A, video B etc. in summary info can comprise corresponding video, play URL, picture and/or, text description etc.
In embodiments of the present invention, search engine receives after the search string of user's input, search all web page resources that comprise search string as the web page resources of coupling, identify after the page type of described web page resources, for different page types, according to web page resources, obtain corresponding cookies information, and according to cookies information acquisition user's history access record, from described history access record, obtain the element information that access times in described web page resources are greater than first threshold, as summary info.Thereby the summary info that makes to be presented in Search Results is the personalized summary information for different user, the information that offers user is more valuable, user just can obtain the information of wanting from summary info, reduce user and because frequently clicking the page corresponding to Search Results, searched the situation generation of information needed, and then improved retrieval rate, reduce the interaction times of search engine, improved data processing rate.
With reference to Fig. 5, show the flow chart of steps of a kind of according to an embodiment of the invention summary info extracting method embodiment five based on search engine, the embodiment of the present invention can comprise the steps:
Step 501, the search string based on receiving in search engine, obtains the web page resources of coupling;
Step 502, identifies the page type of described web page resources;
In a preferred embodiment of the present invention, described step 502 can comprise following sub-step:
Sub-step S61, extracts the page framework of described web page resources, calculates page framework ID;
Sub-step S62, if the quantity of the page framework of same page framework ID is greater than predetermined threshold value, calculates page framework mode;
Sub-step S63, mates described page framework mode with the page framework mode in the database generating in advance, identify page type.
Step 503 for described page type, is searched the summary info corresponding with described web page resources from the summary database generating in advance, and described summary database stores web page resources and corresponding summary info;
Particularly, except the summary info of each web page resources hitting of Real-time Obtaining as described in above-described embodiment one to four, the embodiment of the present invention can also be extracted in advance the summary info of each web page resources when spider captures webpage, be stored in summary database, and upgrade the summary info in summary database every Preset Time section, when hitting certain web page resources, from summary database, obtain the summary info corresponding with described web page resources.
Step 504, exports described summary info.
Wherein, described summary info at least comprises the combination of one or more as follows: the element URL of one or more element informations, component identification, element picture, element text description information.
In embodiments of the present invention, search engine receives after the search string of user's input, search all web page resources that comprise search string as the web page resources of coupling, and by searching the summary info output corresponding with described web page resources in the summary database generating in advance in Search Results, improve search speed, and making to be presented at summary info in Search Results, to express the accuracy of meaning of full page document higher, the information that offers user is more valuable, user just can obtain the information of wanting from summary info, reduce user and because frequently clicking the page corresponding to Search Results, searched the situation generation of information needed, and then reduced the interaction times of search engine, improve data processing rate.
For embodiment of the method, for simple description, therefore it is all expressed as to a series of combination of actions, but those skilled in the art should know, the present invention is not subject to the restriction of described sequence of movement, because according to the present invention, some step can adopt other orders or carry out simultaneously.Secondly, those skilled in the art also should know, the embodiment described in instructions all belongs to preferred embodiment, and related action and module might not be that the present invention is necessary.
With reference to Fig. 6, show the structured flowchart of a kind of summary info extraction element embodiment based on search engine of one embodiment of the invention, described device can comprise as lower module
Web page resources acquisition module 601, is suitable for the search string based on receiving in search engine, obtains the web page resources of coupling;
Page type identification module 602, is suitable for identifying the page type of described web page resources;
Summary info extraction module 603, is suitable for for described page type, extracts corresponding summary info from described web page resources;
Message output module 604, is suitable for exporting described summary info.
In a preferred embodiment of the present invention, described page type identification module 602 is also suitable for:
Extract the page framework of described web page resources, calculate page framework ID;
If the quantity of the page framework of same page framework ID is greater than predetermined threshold value, calculate page framework mode;
Described page framework mode is mated with the page framework mode in the database generating in advance, identify page type.
In a preferred embodiment of the present invention, described web page resources comprises webpage source code, and described page type comprises single page, and described summary info extraction module 603 is also suitable for:
For described single page, from described webpage source code, extract the element information of one or more keys, as summary info.
As a kind of preferred exemplary of the embodiment of the present invention, described single page can comprise following one or more combination: download text page, audio frequency and video are play the page, novel reading page, the question and answer page, newsgroup's diagram page, the thematic page.
In a preferred embodiment of the present invention, described web page resources comprises webpage source code, and described page type comprises original list, and described summary info extraction module 603 is also suitable for:
For described original list, from described webpage source code, extract the preceding one or more element informations of clicking rate sequence that described web page resources counts, as summary info.
As a kind of preferred exemplary of the embodiment of the present invention, described original list can comprise audio frequency and video original list.
In a preferred embodiment of the present invention, described summary info extraction module 603 is also suitable for:
For described page type, the website object corresponding to described web page resources sends the first inquiry request;
Receive the history access record corresponding with described the first inquiry request that described website object sends, described history access record is that described website object obtains after cookies information from current terminal, according to the record of described cookies information acquisition;
From described history access record, obtain the element information that access times in described web page resources are greater than first threshold, as summary info.
In a preferred embodiment of the present invention, described summary info extraction module 603 is also suitable for:
For described page type, to the browser of current terminal, send the second inquiry request, described the second inquiry request comprises the website object identity of described web page resources;
Receive the history access record relevant to described website object identity in the current terminal that described browser returns, the browser that described history access record is current terminal obtains after obtaining the cookies information relevant to described website object;
From described history access record, obtain the element information that access times in described web page resources are greater than first threshold, as summary info.
In a preferred embodiment of the present invention, the embodiment of the present invention can also comprise:
Mark adds module, is suitable for described summary info to add specific markers TAG.
In a preferred embodiment of the present invention, described summary info extraction module 603 is also suitable for:
For described page type, from the summary database generating in advance, search the summary info corresponding with described web page resources, described summary database stores web page resources and corresponding summary info.
As a kind of preferred exemplary of the embodiment of the present invention, described summary info at least can comprise the combination of one or more as follows: the element URL of one or more element informations, component identification, element picture, element text description information.
With reference to Fig. 7, show the structured flowchart of a kind of search engine embodiment of one embodiment of the invention, described search engine can comprise as lower module
Web page resources acquisition module 701, is suitable for the search string based on receiving, and obtains the web page resources of coupling;
Page type identification module 702, is suitable for identifying the page type of described web page resources;
Summary info extraction module 703, is suitable for for described page type, extracts corresponding summary info from described web page resources;
Message output module 704, is suitable for exporting described summary info.
In a preferred embodiment of the present invention, described page type identification module 702 is also suitable for:
Extract the page framework of described web page resources, calculate page framework ID;
If the quantity of the page framework of same page framework ID is greater than predetermined threshold value, calculate page framework mode;
Described page framework mode is mated with the page framework mode in the database generating in advance, identify page type.
In a preferred embodiment of the present invention, described web page resources comprises webpage source code, and described page type comprises single page, and described summary info extraction module 703 is also suitable for:
For described single page, from described webpage source code, extract the element information of one or more keys, as summary info.
As a kind of preferred exemplary of the embodiment of the present invention, described single page can comprise following one or more combination: download text page, audio frequency and video are play the page, novel reading page, the question and answer page, newsgroup's diagram page, the thematic page.
In a preferred embodiment of the present invention, described web page resources comprises webpage source code, and described page type comprises original list, and described summary info extraction module 703 is also suitable for:
For described original list, from described webpage source code, extract the preceding one or more element informations of clicking rate sequence that described web page resources counts, as summary info.
As a kind of preferred exemplary of the embodiment of the present invention, described original list can comprise audio frequency and video original list.
In a preferred embodiment of the present invention, described summary info extraction module 703 is also suitable for:
For described page type, the website object corresponding to described web page resources sends the first inquiry request;
Receive the history access record corresponding with described the first inquiry request that described website object sends, described history access record is that described website object obtains after cookies information from current terminal, according to the record of described cookies information acquisition;
From described history access record, obtain the element information that access times in described web page resources are greater than first threshold, as summary info.
In a preferred embodiment of the present invention, described summary info extraction module 703 is also suitable for:
For described page type, to the browser of current terminal, send the second inquiry request, described the second inquiry request comprises the website object identity of described web page resources;
Receive the history access record relevant to described website object identity in the current terminal that described browser returns, the browser that described history access record is current terminal obtains after obtaining the cookies information relevant to described website object;
From described history access record, obtain the element information that access times in described web page resources are greater than first threshold, as summary info.
In a preferred embodiment of the present invention, the embodiment of the present invention can also comprise:
Mark adds module, is suitable for described summary info to add specific markers TAG.
In a preferred embodiment of the present invention, described summary info extraction module 703 is also suitable for:
For described page type, from the summary database generating in advance, search the summary info corresponding with described web page resources, described summary database stores web page resources and corresponding summary info.
As a kind of preferred exemplary of the embodiment of the present invention, described summary info at least can comprise the combination of one or more as follows: the element URL of one or more element informations, component identification, element picture, element text description information.
Each embodiment in this instructions all adopts the mode of going forward one by one to describe, and each embodiment stresses is the difference with other embodiment, between each embodiment identical similar part mutually referring to.For device or search engine embodiment, because it is substantially similar to embodiment of the method, so description is fairly simple, relevant part is referring to the part explanation of embodiment of the method.
The algorithm providing at this is intrinsic not relevant to any certain computer, virtual system or miscellaneous equipment with demonstration.Various general-purpose systems also can with based on using together with this teaching.According to description above, it is apparent constructing the desired structure of this type systematic.In addition, the present invention is not also for any certain programmed language.It should be understood that and can utilize various programming languages to realize content of the present invention described here, and the description of above language-specific being done is in order to disclose preferred forms of the present invention.
In the instructions that provided herein, a large amount of details have been described.Yet, can understand, embodiments of the invention can not put into practice in the situation that there is no these details.In some instances, be not shown specifically known method, structure and technology, so that not fuzzy understanding of this description.
Similarly, be to be understood that, in order to simplify the disclosure and to help to understand one or more in each inventive aspect, in the above in the description of exemplary embodiment of the present invention, each feature of the present invention is grouped together into single embodiment, figure or sometimes in its description.Yet, the method for the disclosure should be construed to the following intention of reflection: the present invention for required protection requires than the more feature of feature of clearly recording in each claim.Or rather, as reflected in claims below, inventive aspect is to be less than all features of disclosed single embodiment above.Therefore, claims of following embodiment are incorporated to this embodiment thus clearly, and wherein each claim itself is as independent embodiment of the present invention.
Those skilled in the art are appreciated that and can the module in the equipment in embodiment are adaptively changed and they are arranged in one or more equipment different from this embodiment.Module in embodiment or unit or assembly can be combined into a module or unit or assembly, and can put them into a plurality of submodules or subelement or sub-component in addition.At least some in such feature and/or process or unit are mutually repelling, and can adopt any combination to combine all processes or the unit of disclosed all features in this instructions (comprising claim, summary and the accompanying drawing followed) and disclosed any method like this or equipment.Unless clearly statement in addition, in this instructions (comprising claim, summary and the accompanying drawing followed) disclosed each feature can be by providing identical, be equal to or the alternative features of similar object replaces.
In addition, those skilled in the art can understand, although embodiment more described herein comprise some feature rather than further feature included in other embodiment, the combination of the feature of different embodiment means within scope of the present invention and forms different embodiment.For example, in the following claims, the one of any of embodiment required for protection can be used with array mode arbitrarily.
All parts embodiment of the present invention can realize with hardware, or realizes with the software module moved on one or more processor, or realizes with their combination.It will be understood by those of skill in the art that the some or all functions that can use in practice microprocessor or digital signal processor (DSP) to realize the some or all parts in the treatment facility extracting according to the summary info based on search engine of the embodiment of the present invention.The present invention for example can also be embodied as, for carrying out part or all equipment or device program (, computer program and computer program) of method as described herein.Realizing program of the present invention and can be stored on computer-readable medium like this, or can there is the form of one or more signal.Such signal can be downloaded and obtain from internet website, or provides on carrier signal, or provides with any other form.
It should be noted above-described embodiment the present invention will be described rather than limit the invention, and those skilled in the art can design alternative embodiment in the situation that do not depart from the scope of claims.In the claims, any reference symbol between bracket should be configured to limitations on claims.Word " comprises " not to be got rid of existence and is not listed as element or step in the claims.Being positioned at word " " before element or " one " does not get rid of and has a plurality of such elements.The present invention can be by means of including the hardware of some different elements and realizing by means of the computing machine of suitably programming.In having enumerated the unit claim of some devices, several in these devices can be to carry out imbody by same hardware branch.The use of word first, second and C grade does not represent any order.Can be title by these word explanations.

Claims (10)

1. the summary info extracting method based on search engine, comprising:
Search string based on receiving in search engine, obtains the web page resources of coupling;
Identify the page type of described web page resources;
For described page type, from described web page resources, extract corresponding summary info;
Export described summary info.
2. the method for claim 1, is characterized in that, the step of the page type of the described web page resources of described identification comprises:
Extract the page framework of described web page resources, calculate page framework ID;
If the quantity of the page framework of same page framework ID is greater than predetermined threshold value, calculate page framework mode;
Described page framework mode is mated with the page framework mode in the database generating in advance, identify page type.
3. method as claimed in claim 1 or 2, is characterized in that, described web page resources comprises webpage source code, and described page type comprises single page, and described for described page type, the step of extracting corresponding summary info from described web page resources comprises:
For described single page, from described webpage source code, extract the element information of one or more keys, as summary info.
4. the method as described in claim 1-3 any one, is characterized in that, described single page comprises following one or more combination: download text page, audio frequency and video are play the page, novel reading page, the question and answer page, newsgroup's diagram page, the thematic page.
5. the method as described in claim 1-4 any one, it is characterized in that, described web page resources comprises webpage source code, and described page type comprises original list, described for described page type, the step of extracting corresponding summary info from described web page resources comprises:
For described original list, from described webpage source code, extract the preceding one or more element informations of clicking rate sequence that described web page resources counts, as summary info.
6. the summary info extraction element based on search engine, comprising:
Web page resources acquisition module, is suitable for the search string based on receiving in search engine, obtains the web page resources of coupling;
Page type identification module, is suitable for identifying the page type of described web page resources;
Summary info extraction module, is suitable for for described page type, extracts corresponding summary info from described web page resources;
Message output module, is suitable for exporting described summary info.
7. device as claimed in claim 6, is characterized in that, described page type identification module is also suitable for:
Extract the page framework of described web page resources, calculate page framework ID;
If the quantity of the page framework of same page framework ID is greater than predetermined threshold value, calculate page framework mode;
Described page framework mode is mated with the page framework mode in the database generating in advance, identify page type.
8. the device as described in claim 6 or 7, is characterized in that, described web page resources comprises webpage source code, and described page type comprises single page, and described summary info extraction module is also suitable for:
For described single page, from described webpage source code, extract the element information of one or more keys, as summary info.
9. the device as described in claim 6-8 any one, is characterized in that, described summary info extraction module is also suitable for:
For described page type, the website object corresponding to described web page resources sends the first inquiry request;
Receive the history access record corresponding with described the first inquiry request that described website object sends, described history access record is that described website object obtains after cookies information from current terminal, according to the record of described cookies information acquisition;
From described history access record, obtain the element information that access times in described web page resources are greater than first threshold, as summary info.
10. the device as described in claim 6-9 any one, is characterized in that, described summary info extraction module is also suitable for:
For described page type, to the browser of current terminal, send the second inquiry request, described the second inquiry request comprises the website object identity of described web page resources;
Receive the history access record relevant to described website object identity in the current terminal that described browser returns, the browser that described history access record is current terminal obtains after obtaining the cookies information relevant to described website object;
From described history access record, obtain the element information that access times in described web page resources are greater than first threshold, as summary info.
CN201410302674.7A 2014-06-27 2014-06-27 Summary information extraction method and device based on search engine and search engine Pending CN104077388A (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN201410302674.7A CN104077388A (en) 2014-06-27 2014-06-27 Summary information extraction method and device based on search engine and search engine
PCT/CN2015/080676 WO2015196910A1 (en) 2014-06-27 2015-06-03 Search engine-based summary information extraction method, apparatus and search engine

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201410302674.7A CN104077388A (en) 2014-06-27 2014-06-27 Summary information extraction method and device based on search engine and search engine

Publications (1)

Publication Number Publication Date
CN104077388A true CN104077388A (en) 2014-10-01

Family

ID=51598642

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201410302674.7A Pending CN104077388A (en) 2014-06-27 2014-06-27 Summary information extraction method and device based on search engine and search engine

Country Status (2)

Country Link
CN (1) CN104077388A (en)
WO (1) WO2015196910A1 (en)

Cited By (25)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104317930A (en) * 2014-10-31 2015-01-28 北京奇虎科技有限公司 Method and device for optimizing presentation of terminal search
CN104699840A (en) * 2015-03-31 2015-06-10 北京奇虎科技有限公司 Method and device used for providing mobile terminal searching results
CN104866592A (en) * 2015-05-29 2015-08-26 百度在线网络技术(北京)有限公司 Method and apparatus for displaying abstract in search engine
WO2015196910A1 (en) * 2014-06-27 2015-12-30 北京奇虎科技有限公司 Search engine-based summary information extraction method, apparatus and search engine
CN105786841A (en) * 2014-12-22 2016-07-20 北京奇虎科技有限公司 Method and system for generating smart abstract of news webpage
CN105786854A (en) * 2014-12-22 2016-07-20 北京奇虎科技有限公司 Method and system for generating video play webpage abstract in search result
CN105786847A (en) * 2014-12-22 2016-07-20 北京奇虎科技有限公司 Method and system for displaying structured abstracts of commodity web page in e-commerce website
CN105786840A (en) * 2014-12-22 2016-07-20 北京奇虎科技有限公司 Display method and system for structured abstract of music webpage
CN105786836A (en) * 2014-12-22 2016-07-20 北京奇虎科技有限公司 Method and system for generating structured abstract of video webpage
CN105786849A (en) * 2014-12-22 2016-07-20 北京奇虎科技有限公司 Method and system for generating document web page custom abstract
CN105786835A (en) * 2014-12-22 2016-07-20 北京奇虎科技有限公司 Method and system for displaying user-defined abstract of picture webpage in search result
CN105786837A (en) * 2014-12-22 2016-07-20 北京奇虎科技有限公司 Method and system for generating intelligent abstract of novel webpage
CN105786848A (en) * 2014-12-22 2016-07-20 北京奇虎科技有限公司 Method and system for displaying search intelligent abstract on basis of software downloading requirements
CN105786853A (en) * 2014-12-22 2016-07-20 北京奇虎科技有限公司 Display method and system for smart abstract of forum post
CN105808561A (en) * 2014-12-30 2016-07-27 北京奇虎科技有限公司 Method and device for extracting abstract from webpage
CN105808562A (en) * 2014-12-30 2016-07-27 北京奇虎科技有限公司 Method and device for extracting webpage abstract based on weight
CN106055595A (en) * 2016-05-23 2016-10-26 北京金山安全软件有限公司 Method and device for displaying value added service information and electronic equipment
CN108090111A (en) * 2016-11-23 2018-05-29 谷歌有限责任公司 It is taken passages for the animation of search result
CN108090043A (en) * 2017-11-30 2018-05-29 北京百度网讯科技有限公司 Error correction report processing method, device and readable medium based on artificial intelligence
CN110020108A (en) * 2017-09-12 2019-07-16 腾讯科技(深圳)有限公司 Network resource recommended method, device, computer equipment and storage medium
CN110162617A (en) * 2018-09-29 2019-08-23 腾讯科技(深圳)有限公司 Extract method, apparatus, language processing engine and the medium of summary info
CN110532112A (en) * 2019-08-29 2019-12-03 维沃移动通信有限公司 A kind of object extraction method and mobile terminal
CN110825870A (en) * 2019-10-31 2020-02-21 腾讯科技(深圳)有限公司 Document abstract acquisition method and device, storage medium and electronic device
CN113924565A (en) * 2019-06-13 2022-01-11 微软技术许可有限责任公司 Screen reader summary with popular links
CN115130022A (en) * 2022-07-04 2022-09-30 北京字跳网络技术有限公司 Content search method, device, equipment and medium

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110895568B (en) * 2018-09-13 2023-07-21 阿里巴巴集团控股有限公司 Method and system for processing court trial records
CN114422309B (en) * 2021-12-03 2023-08-11 中国电子科技集团公司第二十八研究所 Service message transmission effect analysis method based on abstract return comparison mode
CN114372160B (en) * 2022-01-12 2023-08-15 抖音视界有限公司 Search request processing method and device, computer equipment and storage medium

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102163229A (en) * 2011-04-13 2011-08-24 北京百度网讯科技有限公司 Method and equipment for generating abstracts of searching results
CN102169501A (en) * 2011-04-26 2011-08-31 北京百度网讯科技有限公司 Method and device for generating abstract based on type information of document corresponding with searching result
CN103136359A (en) * 2013-03-07 2013-06-05 宁波成电泰克电子信息技术发展有限公司 Generation method of single document summaries

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20020078019A1 (en) * 2000-10-02 2002-06-20 Lawton Scott S. Method and system for organizing search results into a single page showing two levels of detail
CN101452470B (en) * 2007-10-18 2012-06-06 广州索答信息科技有限公司 Summary-style network search engine system and search method and uses
CN102591971B (en) * 2011-12-31 2015-03-18 北京百度网讯科技有限公司 Method and device for extracting webpage information
CN103761231A (en) * 2013-10-17 2014-04-30 北京奇虎科技有限公司 Method and device for providing media content information of page by search engine
CN104077388A (en) * 2014-06-27 2014-10-01 北京奇虎科技有限公司 Summary information extraction method and device based on search engine and search engine

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102163229A (en) * 2011-04-13 2011-08-24 北京百度网讯科技有限公司 Method and equipment for generating abstracts of searching results
CN102169501A (en) * 2011-04-26 2011-08-31 北京百度网讯科技有限公司 Method and device for generating abstract based on type information of document corresponding with searching result
CN103136359A (en) * 2013-03-07 2013-06-05 宁波成电泰克电子信息技术发展有限公司 Generation method of single document summaries

Cited By (31)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2015196910A1 (en) * 2014-06-27 2015-12-30 北京奇虎科技有限公司 Search engine-based summary information extraction method, apparatus and search engine
CN104317930A (en) * 2014-10-31 2015-01-28 北京奇虎科技有限公司 Method and device for optimizing presentation of terminal search
CN105786848A (en) * 2014-12-22 2016-07-20 北京奇虎科技有限公司 Method and system for displaying search intelligent abstract on basis of software downloading requirements
CN105786853A (en) * 2014-12-22 2016-07-20 北京奇虎科技有限公司 Display method and system for smart abstract of forum post
CN105786841A (en) * 2014-12-22 2016-07-20 北京奇虎科技有限公司 Method and system for generating smart abstract of news webpage
CN105786854A (en) * 2014-12-22 2016-07-20 北京奇虎科技有限公司 Method and system for generating video play webpage abstract in search result
CN105786847A (en) * 2014-12-22 2016-07-20 北京奇虎科技有限公司 Method and system for displaying structured abstracts of commodity web page in e-commerce website
CN105786840A (en) * 2014-12-22 2016-07-20 北京奇虎科技有限公司 Display method and system for structured abstract of music webpage
CN105786836A (en) * 2014-12-22 2016-07-20 北京奇虎科技有限公司 Method and system for generating structured abstract of video webpage
CN105786849A (en) * 2014-12-22 2016-07-20 北京奇虎科技有限公司 Method and system for generating document web page custom abstract
CN105786835A (en) * 2014-12-22 2016-07-20 北京奇虎科技有限公司 Method and system for displaying user-defined abstract of picture webpage in search result
CN105786837A (en) * 2014-12-22 2016-07-20 北京奇虎科技有限公司 Method and system for generating intelligent abstract of novel webpage
CN105808561A (en) * 2014-12-30 2016-07-27 北京奇虎科技有限公司 Method and device for extracting abstract from webpage
CN105808562A (en) * 2014-12-30 2016-07-27 北京奇虎科技有限公司 Method and device for extracting webpage abstract based on weight
CN104699840A (en) * 2015-03-31 2015-06-10 北京奇虎科技有限公司 Method and device used for providing mobile terminal searching results
CN104699840B (en) * 2015-03-31 2016-10-19 北京奇虎科技有限公司 For providing the method and device of mobile terminal to search result
CN104866592B (en) * 2015-05-29 2018-09-07 百度在线网络技术(北京)有限公司 That makes a summary in search engine shows method and apparatus
CN104866592A (en) * 2015-05-29 2015-08-26 百度在线网络技术(北京)有限公司 Method and apparatus for displaying abstract in search engine
CN106055595B (en) * 2016-05-23 2019-10-29 北京金山安全软件有限公司 Method and device for displaying value added service information and electronic equipment
CN106055595A (en) * 2016-05-23 2016-10-26 北京金山安全软件有限公司 Method and device for displaying value added service information and electronic equipment
CN108090111A (en) * 2016-11-23 2018-05-29 谷歌有限责任公司 It is taken passages for the animation of search result
CN108090111B (en) * 2016-11-23 2022-04-01 谷歌有限责任公司 Animated excerpts for search results
CN110020108A (en) * 2017-09-12 2019-07-16 腾讯科技(深圳)有限公司 Network resource recommended method, device, computer equipment and storage medium
CN108090043A (en) * 2017-11-30 2018-05-29 北京百度网讯科技有限公司 Error correction report processing method, device and readable medium based on artificial intelligence
CN110162617A (en) * 2018-09-29 2019-08-23 腾讯科技(深圳)有限公司 Extract method, apparatus, language processing engine and the medium of summary info
CN110162617B (en) * 2018-09-29 2022-11-04 腾讯科技(深圳)有限公司 Method, apparatus, language processing engine and medium for extracting summary information
CN113924565A (en) * 2019-06-13 2022-01-11 微软技术许可有限责任公司 Screen reader summary with popular links
CN110532112A (en) * 2019-08-29 2019-12-03 维沃移动通信有限公司 A kind of object extraction method and mobile terminal
CN110825870A (en) * 2019-10-31 2020-02-21 腾讯科技(深圳)有限公司 Document abstract acquisition method and device, storage medium and electronic device
CN110825870B (en) * 2019-10-31 2023-07-14 腾讯科技(深圳)有限公司 Method and device for acquiring document abstract, storage medium and electronic device
CN115130022A (en) * 2022-07-04 2022-09-30 北京字跳网络技术有限公司 Content search method, device, equipment and medium

Also Published As

Publication number Publication date
WO2015196910A1 (en) 2015-12-30

Similar Documents

Publication Publication Date Title
CN104077388A (en) Summary information extraction method and device based on search engine and search engine
US11669579B2 (en) Method and apparatus for providing search results
US11151177B2 (en) Search method and apparatus based on artificial intelligence
CN100476830C (en) Network resource searching method and system
US10558754B2 (en) Method and system for automating training of named entity recognition in natural language processing
CN101918945B (en) Automatic expanded language search
US20090240638A1 (en) Syntactic and/or semantic analysis of uniform resource identifiers
CN103221951A (en) Predictive query suggestion caching
JP2015204103A (en) Interactive search and recommendation method and device thereof
US20200265074A1 (en) Searching multilingual documents based on document structure extraction
CN104036038A (en) News recommendation method and system
CN102915380A (en) Method and system for carrying out searching on data
CN102930054A (en) Data search method and data search system
CN104063476A (en) Social network-based content recommending method and system
CN103164542A (en) Method of data searching and client-side
CN104268185A (en) Method and device for searching application on application distribution platform
US10467536B1 (en) Domain name generation and ranking
CN103514282A (en) Method and device for displaying search results of videos
CN103942264A (en) Method and device for pushing webpages containing news information
CN102982118A (en) Searching method and device based on favorites
CN103631889A (en) Image recognizing method and device
US20200073925A1 (en) Method and system for generating a website from collected content
CN105630950A (en) Guidance type search method and system
CN103530389A (en) Method and device for improving stopword searching effectiveness
CN102902784A (en) Web page classification storage system and method

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20141001