CN105808569A - Method and device for providing abstract searching service - Google Patents

Method and device for providing abstract searching service Download PDF

Info

Publication number
CN105808569A
CN105808569A CN201410844411.9A CN201410844411A CN105808569A CN 105808569 A CN105808569 A CN 105808569A CN 201410844411 A CN201410844411 A CN 201410844411A CN 105808569 A CN105808569 A CN 105808569A
Authority
CN
China
Prior art keywords
web page
html tag
label
original web
webpages
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201410844411.9A
Other languages
Chinese (zh)
Inventor
雷鹏
文维东
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Qihoo Technology Co Ltd
Qizhi Software Beijing Co Ltd
Original Assignee
Beijing Qihoo Technology Co Ltd
Qizhi Software Beijing Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Qihoo Technology Co Ltd, Qizhi Software Beijing Co Ltd filed Critical Beijing Qihoo Technology Co Ltd
Priority to CN201410844411.9A priority Critical patent/CN105808569A/en
Publication of CN105808569A publication Critical patent/CN105808569A/en
Pending legal-status Critical Current

Links

Abstract

The invention discloses a method for providing an abstract searching service. The method comprises the following steps of: obtaining various original web pages; simplifying various original web pages to obtain various simplified web pages; and correspondingly storing various simplified web pages and corresponding URLs in a storage system, so that various simplified web pages and the corresponding URLs are used when the abstract searching service is provided. According to the technical scheme provided by the method, the original web pages are simplified; redundant data information having nothing to do with the abstract searching service in the original web pages is removed; corresponding relationship data of the simplified web pages and the URLs is established and stored; therefore, the abstract searching service is provided in an assistant manner; and thus, the efficiency and the attentiveness of the abstract searching service are further improved.

Description

A kind of method and apparatus that search summary service is provided
Technical field
The present invention relates to data mining technology field, be specifically related to a kind of method and apparatus that search summary service is provided.
Background technology
Fast development along with Internet technology, network has become as people and obtains important channel and the hands section of information, magnanimity information in network both brought more convenient, also many problems are brought, in order to find useful information, people often take substantial amounts of time removal search, browse and search, therefore the various search services that search engine provides in recent years increasingly cause the concern of people, wherein, the summary of each webpage is shown in the window of Search Results by search summary service, make user without opening whether webpage just can meet search need by this webpage very clear.
In prior art, the search summary service that search engine provides mostly is based on static mode and generates search summary, and namely search summary is independent of inquiry, according to certain rule, extracts some words at pretreatment stage in advance from web page contents.Such as, 160 bytes of the beginning of intercepting page text (corresponding 80 Chinese characters), or, spell by the first of each paragraph sentence.The summary so formed leaves in Query Subsystem, mates with query term once the document of related web page is selected, just the summary prestored is showed user.Obviously, search engine is the most easily by this mode, it is not necessary to do other process work.But the maximum shortcoming of this mode is: the search word that the summary of offer inputs with user is unrelated, is unsatisfactory for the search need of user.
Summary of the invention
In view of the above problems, it is proposed that the present invention is to provide a kind of a kind of method and apparatus providing search summary service overcoming the problems referred to above or solving the problems referred to above at least in part.
According to one aspect of the present invention, it is provided that a kind of method providing search summary service, the method includes:
Obtain each original web page;
Described each original web page is simplified, it is thus achieved that each condensed webpages;
URL correspondence corresponding for each condensed webpages is saved in storage system, in order to providing search summary to use when servicing.
Alternatively, described acquisition original web page includes:
Web crawlers is utilized to crawl each original web page.
Alternatively, described each original web page is simplified includes:
JS and CSS code in each original web page is removed.
Alternatively, described each original web page is simplified, it is thus achieved that each condensed webpages includes:
For an original web page, the html tag in this webpage is classified;
Remove and belong to the html tag specifying classification, retain the html tag being not belonging to specify classification;
For being not belonging to specify the html tag of classification, analyze its attribute, retain the one or more attributes specified;
It is put into after content of text in the condensed webpages of correspondence by the Content Transformation of the html tag of reservation.
Alternatively, described each original web page is simplified, it is thus achieved that each condensed webpages includes:
The mode adopting state machine analyzes the html tag in an original web page one by one, and described state machine includes following state:
Original state: start byte-by-byte being analyzed from current location, time initial, current location is the original position of web page contents;
Label starts state: be the discovery that when html tag starts, it is judged that whether this html tag belongs to the html tag specifying classification, if it is skips this html tag, transfers to for original state, if not then transferring attribute status to;
Attribute status: analyze the attribute of this html tag, retains the one or more attributes specified, and enters text status:
Text status: be put into after content of text in the condensed webpages of correspondence by the Content Transformation that retains of this html tag, enters label done state:
Label done state: be the discovery that when html tag terminates, enters label done state, then turns to original state.
Alternatively, the html tag of described appointment classification includes one or more in following label:
Script label, noscript label, iframe label, and single label, comment tag and comprise the label of display:none.
Alternatively, described storage system is distributed memory system.
According to another aspect of the present invention, it is provided that a kind of device providing search summary service, this device includes:
Acquiring unit, is suitable to obtain each original web page;
Simplify unit, be suitable to described each original web page is simplified, it is thus achieved that each condensed webpages;
Storage unit, is suitable to be saved in storage system by URL correspondence corresponding for each condensed webpages, in order to providing search summary to use when servicing.
Alternatively, described acquiring unit, be suitable to utilize web crawlers to crawl each original web page.
Alternatively, described in simplify unit, be suitable to remove JS and CSS code in each original web page.
Alternatively, described in simplify unit, be suitable to, for an original web page, the html tag in this webpage be classified;Remove and belong to the html tag specifying classification, retain the html tag being not belonging to specify classification;For being not belonging to specify the html tag of classification, analyze its attribute, retain the one or more attributes specified;It is put into after content of text in the condensed webpages of correspondence by the Content Transformation of the html tag of reservation.
Alternatively, described in simplify unit, the mode being suitable for use with state machine analyzes the html tag in an original web page one by one, specifically includes following state:
Original state: start byte-by-byte being analyzed from current location, time initial, current location is the original position of web page contents;
Label starts state: be the discovery that when html tag starts, it is judged that whether this html tag belongs to the html tag specifying classification, if it is skips this html tag, transfers to for original state, if not then transferring attribute status to;
Attribute status: analyze the attribute of this html tag, retains the one or more attributes specified, and enters text status:
Text status: be put into after content of text in the condensed webpages of correspondence by the Content Transformation that retains of this html tag, enters label done state:
Label done state: be the discovery that when html tag terminates, enters label done state, then turns to original state.
Alternatively, the html tag of described appointment classification includes one or more in following label:
Script label, noscript label, iframe label, and single label, comment tag and comprise the label of display:none.
Alternatively, described storage unit, be suitable to be saved in distributed memory system URL correspondence corresponding for each condensed webpages.
From the above, technical scheme provided by the invention is by simplifying original web page, the redundant data information unrelated with providing search summary service in original web page is removed, set up and preserve the corresponding relation data of condensed webpages and URL to assist offer search summary service, improve efficiency and focus that search summary service is provided further.
Described above is only the general introduction of technical solution of the present invention, in order to better understand the technological means of the present invention, and can be practiced according to the content of description, and in order to above and other objects of the present invention, feature and advantage can be become apparent, below especially exemplified by the specific embodiment of the present invention.
Accompanying drawing explanation
By reading hereafter detailed description of the preferred embodiment, various other advantage and benefit those of ordinary skill in the art be will be clear from understanding.Accompanying drawing is only for illustrating the purpose of preferred implementation, and is not considered as limitation of the present invention.And in whole accompanying drawing, it is denoted by the same reference numerals identical parts.In the accompanying drawings:
Fig. 1 illustrates the flow chart of a kind of according to an embodiment of the invention method providing search summary service;
Fig. 2 illustrates the schematic diagram of a kind of according to an embodiment of the invention device providing search summary service.
Detailed description of the invention
It is more fully described the exemplary embodiment of the disclosure below with reference to accompanying drawings.Although accompanying drawing showing the exemplary embodiment of the disclosure, it being understood, however, that may be realized in various forms the disclosure and should do not limited by embodiments set forth here.On the contrary, it is provided that these embodiments are able to be best understood from the disclosure, and complete for the scope of the present disclosure can be conveyed to those skilled in the art.
Fig. 1 illustrates the flow chart of a kind of according to an embodiment of the invention method providing search summary service.As it is shown in figure 1, the method includes:
Step S110, obtains each original web page.
In one embodiment of the invention, this step S110 acquisition original web page includes: utilize web crawlers to crawl each original web page.
Step S120, simplifies each original web page, it is thus achieved that each condensed webpages.
In this step, to except some junk information in original web page and other information unrelated with extracting summary.
Step S130, is saved in storage system by URL correspondence corresponding for each condensed webpages, in order to providing search summary to use when servicing.
Visible, method shown in Fig. 1 is by simplifying original web page, the redundant data information unrelated with providing search summary service in original web page is removed, set up and preserve the corresponding relation data of condensed webpages and URL to assist offer search summary service, improve efficiency and focus that search summary service is provided further.
Based on the condensed webpages storehouse that the method described in Fig. 1 obtains, actual search scene can be applied as follows: obtain search word and the url list of Search Results corresponding to this search word, from condensed webpages storehouse, obtain the condensed webpages corresponding for each URL in described url list, condensed webpages storehouse is wherein preserved each URL and corresponding condensed webpages;From each condensed webpages, summary corresponding to each URL is extracted respectively according to described search word.So can improve efficiency and the accuracy of abstract extraction.The summary extracted is shown to user as a part for Search Results.
In one embodiment of the invention, owing to the JS code main users in webpage controls the interactive operation in the page and carries out the page and beautifies, CSS code in webpage is mainly used in design page layout, both codes are all unrelated with providing search summary service, therefore, in one embodiment of the invention, each original web page is simplified and be may is that and removed by JS and CSS code in each original web page by the step S120 of method shown in Fig. 1.
In another embodiment of the present invention, each original web page is simplified by the step S120 of method shown in Fig. 1, it is thus achieved that the process of each condensed webpages comprises the steps:
Step S121, for an original web page, classifies to the html tag in this webpage.
Step S122, removes and belongs to the html tag specifying classification, retains the html tag being not belonging to specify classification.
In this step, the html tag specifying classification is the label unrelated with providing search summary service, including one or more in following label: Script label, noscript label, iframe label, and single label, comment tag and comprise the label of display:none.Wherein, Script label is used for defining client script, such as JavaScript, updates for image manipulation, form validation and dynamic content;Noscript label is used for the replacement defined when script is not performed, for can recognize that Script label but cannot supporting the browser of script therein;Iframe element to be used for creating the inline frame (at once inner frame) comprising another one document;Single label such as br label is used for inserting a simple newline;Comment tag for inserting annotation in source code;The label comprising display:none is for stashing certain element on webpage.
Step S123, for being not belonging to specify the html tag of classification, analyzes its attribute, retains the one or more attributes specified.
In this step, the attribute of html tag is specified in starting label, for representing character and the characteristic of current html tag.In certain embodiments, for condensed webpages content of trying one's best, most attribute to be skipped, only retain id attribute, easy-to-look-up problem.
Step S124, is put into after content of text in the condensed webpages of correspondence by the Content Transformation of the html tag of reservation.
In a specific embodiment, each original web page is simplified by above-mentioned steps S121 to step S124, the process obtaining each condensed webpages can adopt the mode of state machine to realize, and adopts the mode of state machine to analyze the html tag in an original web page one by one, and this state machine includes following state:
Original state: start byte-by-byte being analyzed from current location, time initial, current location is the original position of web page contents.
Label starts state: be the discovery that when html tag starts, it is judged that whether this html tag belongs to the html tag specifying classification, if it is skips this html tag, transfers to for original state, if not then transferring attribute status to.Here, the html tag specifying classification is the label unrelated with providing search summary service, including one or more in following label: Script label, noscript label, iframe label, and single label, comment tag and comprise the label of display:none.
Attribute status: analyze the attribute of this html tag, retains the one or more attributes specified, and enters text status.In one embodiment of the invention, for condensed webpages content of trying one's best, most attribute to be skipped, only retain id attribute.
Text status: be put into after content of text in the condensed webpages of correspondence by the Content Transformation that retains of this html tag, enters label done state.
Label done state: be the discovery that when html tag terminates, enters label done state, then turns to original state.
By that analogy, until the whole html tags in original web page are analyzed complete, it is thus achieved that final condensed webpages.
In one embodiment of the invention, URL correspondence corresponding for each condensed webpages is saved in storage system by the step S130 of method shown in Fig. 1, and this storage system is distributed memory system, improves storage efficiency.
Fig. 2 illustrates the schematic diagram of a kind of according to an embodiment of the invention device providing search summary service.As in figure 2 it is shown, the device 200 of this offer search summary service includes:
Acquiring unit 210, is suitable to obtain each original web page.
Simplify unit 220, be suitable to each original web page is simplified, it is thus achieved that each condensed webpages.
Storage unit 230, is suitable to be saved in storage system by URL correspondence corresponding for each condensed webpages, in order to providing search summary to use when servicing.
Visible, device shown in Fig. 2 is cooperated by each unit, original web page is simplified, the redundant data information unrelated with providing search summary service in original web page is removed, set up and preserve the corresponding relation data of condensed webpages and URL to assist offer search summary service, improve efficiency and focus that search summary service is provided further.
In one embodiment of the invention, the acquiring unit 210 of Fig. 2 shown device, be suitable to utilize web crawlers to crawl each original web page.
Owing to the JS code main users in webpage controls the interactive operation in the page and carries out the page and beautifies, the CSS code in webpage is mainly used in design page layout, both codes all with provide that searching for makes a summary and service unrelated.Therefore, in one embodiment of the invention, Fig. 2 shown device simplify unit 220, be suitable to remove JS and CSS code in each original web page, to reach to simplify the purpose of original web page.
In one embodiment of the invention, Fig. 2 shown device simplify unit 220, be suitable to, for an original web page, the html tag in this webpage be classified;Remove and belong to the html tag specifying classification, retain the html tag being not belonging to specify classification;For being not belonging to specify the html tag of classification, analyze its attribute, retain the one or more attributes specified;It is put into after content of text in the condensed webpages of correspondence by the Content Transformation of the html tag of reservation.
Wherein, the html tag specifying classification is the label unrelated with providing search summary service, including one or more in following label: Script label, noscript label, iframe label, and single label, comment tag and comprise the label of display:none.Wherein, Script label is used for defining client script, such as JavaScript, updates for image manipulation, form validation and dynamic content;Noscript label is used for the replacement defined when script is not performed, for can recognize that Script label but cannot supporting the browser of script therein;Iframe element to be used for creating the inline frame (at once inner frame) comprising another one document;Single label such as br label is used for inserting a simple newline;Comment tag for inserting annotation in source code;The label comprising display:none is for stashing certain element on webpage.The attribute of html tag is specified in starting label, for representing character and the characteristic of current html tag.In certain embodiments, for condensed webpages content of trying one's best, most attribute to be skipped, only retain id attribute, easy-to-look-up problem.
In a specific embodiment, the simplifying unit 220 mode of state machine can be adopted to realize each original web page is simplified of Fig. 2 shown device, it is thus achieved that the process of each condensed webpages.Simplifying unit 220, the mode being suitable for use with state machine analyzes the html tag in an original web page one by one, specifically includes following state:
Original state: start byte-by-byte being analyzed from current location, time initial, current location is the original position of web page contents;
Label starts state: be the discovery that when html tag starts, it is judged that whether this html tag belongs to the html tag specifying classification, if it is skips this html tag, transfers to for original state, if not then transferring attribute status to.
Attribute status: analyze the attribute of this html tag, retains the one or more attributes specified, and enters text status.
Text status: be put into after content of text in the condensed webpages of correspondence by the Content Transformation that retains of this html tag, enters label done state.
Label done state: be the discovery that when html tag terminates, enters label done state, then turns to original state.
In one embodiment of the invention, the storage unit 230 of Fig. 2 shown device, be suitable to be saved in distributed memory system by URL correspondence corresponding for each condensed webpages, to obtain higher storage efficiency.
In sum, technical scheme provided by the invention is by simplifying original web page, the redundant data information unrelated with providing search summary service in original web page is removed, set up and preserve the corresponding relation data of condensed webpages and URL to assist offer search summary service, improve efficiency and focus that search summary service is provided further.
It should be understood that
Not intrinsic to any certain computer, virtual bench or miscellaneous equipment relevant in algorithm and the display of this offer.Various fexible units can also with use based on together with this teaching.As described above, the structure constructed required by this kind of device is apparent from.Additionally, the present invention is also not for any certain programmed language.It is understood that, it is possible to utilize various programming language to realize the content of invention described herein, and the description above language-specific done is the preferred forms in order to disclose the present invention.
In description mentioned herein, describe a large amount of detail.It is to be appreciated, however, that embodiments of the invention can be put into practice when not having these details.In some instances, known method, structure and technology it are not shown specifically, in order to do not obscure the understanding of this description.
Similarly, it is to be understood that, one or more in order to what simplify that the disclosure helping understands in each inventive aspect, herein above in the description of the exemplary embodiment of the present invention, each feature of the present invention is grouped together in single embodiment, figure or descriptions thereof sometimes.But, the method for the disclosure should be construed to and reflect an intention that namely the present invention for required protection requires feature more more than the feature being expressly recited in each claim.More precisely, as the following claims reflect, inventive aspect is in that all features less than single embodiment disclosed above.Therefore, it then follows claims of detailed description of the invention are thus expressly incorporated in this detailed description of the invention, wherein each claim itself as the independent embodiment of the present invention.
Those skilled in the art are appreciated that, it is possible to carry out the module in the equipment in embodiment adaptively changing and they being arranged in one or more equipment different from this embodiment.Module in embodiment or unit or assembly can be combined into a module or unit or assembly, and multiple submodule or subelement or sub-component can be put them in addition.Except at least some in such feature and/or process or unit excludes each other, it is possible to adopt any combination that all processes or the unit of all features disclosed in this specification (including adjoint claim, summary and accompanying drawing) and so disclosed any method or equipment are combined.Unless expressly stated otherwise, each feature disclosed in this specification (including adjoint claim, summary and accompanying drawing) can be replaced by the alternative features providing purpose identical, equivalent or similar.
In addition, those skilled in the art it will be appreciated that, although embodiments more described herein include some feature included in other embodiments rather than further feature, but the combination of the feature of different embodiment means to be within the scope of the present invention and form different embodiments.Such as, in the following claims, the one of any of embodiment required for protection can mode use in any combination.
The all parts embodiment of the present invention can realize with hardware, or realizes with the software module run on one or more processor, or realizes with their combination.It will be understood by those of skill in the art that the some or all functions of the some or all parts that microprocessor or digital signal processor (DSP) can be used in practice to realize in a kind of device providing search summary service according to embodiments of the present invention.The present invention is also implemented as part or all the equipment for performing method as described herein or device program (such as, computer program and computer program).The program of such present invention of realization can store on a computer-readable medium, or can have the form of one or more signal.Such signal can be downloaded from internet website and obtain, or provides on carrier signal, or provides with any other form.
The present invention will be described rather than limits the invention to it should be noted above-described embodiment, and those skilled in the art can design alternative embodiment without departing from the scope of the appended claims.In the claims, any reference marks that should not will be located between bracket is configured to limitations on claims.Word " comprises " and does not exclude the presence of the element or step not arranged in the claims.Word "a" or "an" before being positioned at element does not exclude the presence of multiple such element.The present invention by means of including the hardware of some different elements and can realize by means of properly programmed computer.In the unit claim listing some devices, several in these devices can be through same hardware branch and specifically embody.Word first, second and third use do not indicate that any order.Can be title by these word explanations.

Claims (10)

1. the method that search summary service is provided, wherein, the method includes:
Obtain each original web page;
Described each original web page is simplified, it is thus achieved that each condensed webpages;
URL correspondence corresponding for each condensed webpages is saved in storage system, in order to providing search summary to use when servicing.
2. the method for claim 1, wherein described acquisition original web page includes:
Web crawlers is utilized to crawl each original web page.
3. the method as described in any one of claim 1-2, described each original web page is simplified includes:
JS and CSS code in each original web page is removed.
4. the method as described in any one of claim 1-3, wherein, described simplifies each original web page, it is thus achieved that each condensed webpages includes:
For an original web page, the html tag in this webpage is classified;
Remove and belong to the html tag specifying classification, retain the html tag being not belonging to specify classification;
For being not belonging to specify the html tag of classification, analyze its attribute, retain the one or more attributes specified;
It is put into after content of text in the condensed webpages of correspondence by the Content Transformation of the html tag of reservation.
5. the method as described in any one of claim 1-4, wherein, described simplifies each original web page, it is thus achieved that each condensed webpages includes:
The mode adopting state machine analyzes the html tag in an original web page one by one, and it is one or more that described state machine includes in following state:
Original state: start byte-by-byte being analyzed from current location, time initial, current location is the original position of web page contents;
Label starts state: be the discovery that when html tag starts, it is judged that whether this html tag belongs to the html tag specifying classification, if it is skips this html tag, transfers to for original state, if not then transferring attribute status to;
Attribute status: analyze the attribute of this html tag, retains the one or more attributes specified, and enters text status:
Text status: be put into after content of text in the condensed webpages of correspondence by the Content Transformation that retains of this html tag, enters label done state:
Label done state: be the discovery that when html tag terminates, enters label done state, then turns to original state.
6. the method as described in claim 1-5, wherein, the html tag of described appointment classification includes one or more in following label:
Script label, noscript label, iframe label, and single label, comment tag and comprise the label of display:none.
7. the method as described in any one of claim 1-6, wherein,
Described storage system is distributed memory system.
8. providing a device for search summary service, wherein, this device includes:
Acquiring unit, is suitable to obtain each original web page;
Simplify unit, be suitable to described each original web page is simplified, it is thus achieved that each condensed webpages;
Storage unit, is suitable to be saved in storage system by URL correspondence corresponding for each condensed webpages, in order to providing search summary to use when servicing.
9. device as claimed in claim 8, wherein,
Described acquiring unit, is suitable to utilize web crawlers to crawl each original web page.
10. the device as described in any one of claim 8-9, wherein,
Described simplify unit, be suitable to remove JS and CSS code in each original web page.
CN201410844411.9A 2014-12-30 2014-12-30 Method and device for providing abstract searching service Pending CN105808569A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201410844411.9A CN105808569A (en) 2014-12-30 2014-12-30 Method and device for providing abstract searching service

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201410844411.9A CN105808569A (en) 2014-12-30 2014-12-30 Method and device for providing abstract searching service

Publications (1)

Publication Number Publication Date
CN105808569A true CN105808569A (en) 2016-07-27

Family

ID=56420014

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201410844411.9A Pending CN105808569A (en) 2014-12-30 2014-12-30 Method and device for providing abstract searching service

Country Status (1)

Country Link
CN (1) CN105808569A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107203405A (en) * 2017-06-23 2017-09-26 郑州云海信息技术有限公司 A kind of method and apparatus for checking multilingual definition

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2009073219A2 (en) * 2007-12-04 2009-06-11 Yahoo! Inc. Third-party information overlay on search results
CN102479181A (en) * 2010-11-22 2012-05-30 中国电信股份有限公司 Method and device for extracting webpage text based on DIV (Division) position
CN102779169A (en) * 2012-06-27 2012-11-14 江苏新瑞峰信息科技有限公司 Extracting method and device for webpage content based on HTML (Hypertext Markup Language) label
CN103678487A (en) * 2013-11-08 2014-03-26 北京奇虎科技有限公司 Method and device for generating web page snapshot

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2009073219A2 (en) * 2007-12-04 2009-06-11 Yahoo! Inc. Third-party information overlay on search results
CN102479181A (en) * 2010-11-22 2012-05-30 中国电信股份有限公司 Method and device for extracting webpage text based on DIV (Division) position
CN102779169A (en) * 2012-06-27 2012-11-14 江苏新瑞峰信息科技有限公司 Extracting method and device for webpage content based on HTML (Hypertext Markup Language) label
CN103678487A (en) * 2013-11-08 2014-03-26 北京奇虎科技有限公司 Method and device for generating web page snapshot

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
沈怡涛: "基于视觉特征和文本结构分析的中文网页自动摘要技术研究", 《中国优秀硕士学位论文全文数据库 信息科技辑》 *

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107203405A (en) * 2017-06-23 2017-09-26 郑州云海信息技术有限公司 A kind of method and apparatus for checking multilingual definition

Similar Documents

Publication Publication Date Title
Asakawa et al. Transcoding
US9734261B2 (en) Context aware query selection
US20150295942A1 (en) Method and server for performing cloud detection for malicious information
WO2018106974A1 (en) Content validation and coding for search engine optimization
US20160364373A1 (en) Method and apparatus for extracting webpage information
CN103605688A (en) Intercept method and intercept device for homepage advertisements and browser
WO2015021199A1 (en) Access and management of entity-augmented content
CN104239298A (en) Text message recommendation method, server, browser and system
EP3851981A1 (en) Page processing method and apparatus, electronic device and computer readable medium
CN104077273A (en) Method and device for extracting webpage contents
CN104331438A (en) Method and device for selectively extracting content of novel webpage
CN104090869B (en) A kind of method and translation system for translating the network information
CN103729178A (en) Method and system for processing multiple tabs of browsers
CN105786836A (en) Method and system for generating structured abstract of video webpage
CN105808561A (en) Method and device for extracting abstract from webpage
CN102902784A (en) Web page classification storage system and method
CN104699836A (en) Multi-keyword search prompting method and multi-keyword search prompting device
CN105447191A (en) Intelligent abstracting method for providing graphic guidance steps and corresponding device
CN105938496A (en) Webpage content extraction method and apparatus
CN109558123A (en) The method of webpage conversion electrons book, electronic equipment, storage medium
CN104778232A (en) Searching result optimizing method and device based on long query
CN107077499B (en) Generation of mapping definitions for content management systems
CN103064943A (en) Customer premises equipment
CN105808569A (en) Method and device for providing abstract searching service
CN105224552A (en) The disposal route of the network information, device and system

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20160727