CN103838797A - Method for optimizing mobile search engine - Google Patents
Method for optimizing mobile search engine Download PDFInfo
- Publication number
- CN103838797A CN103838797A CN201210491498.7A CN201210491498A CN103838797A CN 103838797 A CN103838797 A CN 103838797A CN 201210491498 A CN201210491498 A CN 201210491498A CN 103838797 A CN103838797 A CN 103838797A
- Authority
- CN
- China
- Prior art keywords
- wml
- search engine
- stu
- mobile
- page
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/95—Retrieval from the web
- G06F16/951—Indexing; Web crawling techniques
Landscapes
- Engineering & Computer Science (AREA)
- Databases & Information Systems (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Information Transfer Between Computers (AREA)
Abstract
The invention discloses a method for optimizing a mobile search engine. The method comprises the following steps of designing a mobile search engine framework, establishing a URL list, editing a translator, and designing a WAP interface. Due to the fact that a mobile module is added on an existing Internet search engine framework according to the current situation of the mobile search engine, a mode for establishing the mobile search engine by HTML resources is provided. According to the mode, HTML web pages captured by web spiders are processed in a centralized mode, theme information extraction is conducted on the HTML web pages, then the theme information is converted into WML web pages which can be identified by a mobile phone and are stored in a WML snapshoot library, when a user click item record to check a specific web page, a system can not directly link the web page on the Internet but link a WML web page snapshot corresponding to the web page, and the requirement for mobile search of the user is met. In actual application, the mode is used for successfully establishing a mobile search engine facing the life service field and covering the catering, entertainment and yellow page information of nearly 40 cities in China.
Description
Technical field
The present invention relates to development of Mobile Internet technology, particularly a kind of optimization method of mobile search engine.
Background technology
Search engine refers to according to certain strategy, uses specific computer program to gather information from internet, after information being organized and is processed, and for user provides retrieval service, the system by information display relevant user search to user.Now, along with the innovation of wireless communication technique and popularizing of mobile phone, mobile Internet access becomes development trend gradually, inquires about the demand of clothing, food, lodging and transportion--basic necessities of life information in order to meet user whenever and wherever possible, how to set up mobile search engine, becomes the focus of mobile network's application.Mobile Internet access is subject to the restriction of mobile phone terminal and transmission bandwidth, and pure html text only has minority intelligence type to support, most of mobile phones are only identified the language of wap protocol mark, as WML or xHTML.But the network information is mainly expressed with html language, the resource-constrained of WAP, cannot provide enough information to crawl the WAP page as the mobile search engine of information source merely.Therefore, how to break through restriction, make mobile phone client also can search the magnanimity information that derives from HTML, become one of subject matter of mobile search.For using mobile phone to browse html page, general method is to add a WAP gateway, in the time that mobile phone sends the request of browsing html web page, first reads this webpage by gateway, and converts it into corresponding WML, re-sends to mobile phone.This mode is also the current spread path that universal search engine is expanded to mobile search engine.But the mode of this real time translation, obviously performance and the bandwidth requirement to gateway is higher.
The present invention is directed to the present situation of mobile search engine, on the framework of existing internet search engine, add mobile module, propose a kind of HTML of utilization resource and set up the mode of mobile search engine, the html web page which captures by focusing on Web Spider, the html web page that Web Spider is captured is translated processing, translated into the snapshots of web pages of WML form, generated the snapshots of web pages of WML language, met user's mobile search demand.The mobile search engine of setting up with this technology, does not need the support of real time translation gateway, can expand easily existing search engine system.In actual applications, make successfully to set up in this way a mobile search engine towards service for life field, covered food and drink, amusement and the yellow page information in nearly 40 cities, the whole nation.
Summary of the invention
According to the present situation of mobile search engine, a kind of mobile search engine optimization method is proposed, comprise the following steps:
A, design mobile search engine framework
Comprise the following steps: this search engine framework is also made up of searcher, index, searcher, four parts of user interface, also has mobile module, as mobile search engine,
It comprises three parts:
Translater, the HTML page that spider is captured is converted into WML page;
WML snapshots of web pages storehouse, preserves the WML page after transforming;
WAP interface, by the user interface of mobile phone access;
B, set up url list
Deposit the webpage grabbing in web page library, and all hyperlink on webpage are deposited in url list;
C, editor's translater
Translater has home page filter, subject information filters and three parts of translation;
C.1 home page filter
First catalog page is filtered, will not translate, count and the ratio that links number according to the text section of webpage, carry out the character of paging, deposit index database in;
C.2 subject information filters
Extract the Topic relative part of webpage, select the tree-model of the STU-DOM that does not rely on information source,
Using the table of webpage, tr, div and tbody label node as piecemeal node, for the local correlation degree Local Correlativity for choice of a piece) and context dependent degree Contextual Correlativity weigh; Local correlation degree determines by piece internal chaining and content, and its computing formula can be expressed as:
Wherein, ContentLength and LinkCount represent respectively word number and the link number in piece, j sub-block of expression;
Context dependent degree determines by piece internal chaining and father's piece content, and its computing formula can be expressed as:
Wherein, STU
pirepresent STU
ifather node;
The design's regulation local correlation degree threshold value is 2, and the threshold value of context dependent degree is 70;
C3. HTML is transformed to WML:
In the time that HTML piece transforms, first to remove the element that WML cannot process, as labels such as style, front, script; Then, set up the mapping table that html tag and WML label transform, according to relation list, HTML be converted into the readable WML of mobile phone,
The text cannot a screen display on mobile phone showing, need to carry out paging processing, and deposit in the snapshot storehouse of WML;
D, design WAP interface
WAP interface is the man-machine interaction query interface taking mobile phone as carrier; Adopt WML or xHTML language design; Content on design WAP is as far as possible terse: in the list page of Search Results, entry number is no more than at most ten.
Compared with prior art, the present invention has following beneficial effect:
1, the present invention can break through restriction, makes mobile phone client also can search the magnanimity information that derives from HTML, for mobile search provides information widely.
2, the mobile search engine that the present invention sets up with this technology, does not need the support of real time translation gateway, has departed from performance and the higher problem of bandwidth requirement to gateway, can expand easily existing search engine system.
Brief description of the drawings
The present invention has accompanying drawing 2 width, wherein:
Fig. 1 is mobile search engine system frame diagram.
Fig. 2 is mobile search interface schematic diagram.
Embodiment
A, design mobile search engine framework
Comprise the following steps: the same with general search automotive engine system, this search engine framework is also made up of searcher, index, searcher, four parts of user interface, add mobile module, make it to become and extend the mobile search engine of expanding out, it comprises three parts:
Translater, the HTML page that spider is captured is converted into WML page;
WML snapshots of web pages storehouse, preserves the WML page after transforming;
WAP interface, by the user interface of mobile phone access.
Basic framework as shown in Figure 1.
B。, set up url list
First the present invention is started by Web Spider, regularly automatically starts and captures internet site, deposits the webpage grabbing in web page library, and all hyperlink on webpage are deposited in url list.
C, editor's translater
Due to mobile search engine need fast, directly, refining Query Information is returned to user, but in the webpage that spider captures, not only exist part without theme page; And, even there is the page of theme conventionally also to have the irrelevant information of a large amount of and theme.Therefore directly translate and be not suitable for, according to the feature of mobile search, according to the feature of mobile search, translater is designed to home page filter, subject information filters and three parts of translation.
C.1 home page filter
First catalog page is filtered, will not translate, count and the ratio that links number according to the text section of webpage, carry out the character of paging, deposit index database in.Index carries out word segmentation processing by oneself through the web document capturing, and the position occurring in webpage by word and frequency computation part weights, then deposits word segmentation result in index database.
C.2 subject information filters
Extract the Topic relative part of webpage, selection does not rely on the tree-model of the STU-DOM of information source, using label nodes such as the <table> of webpage, <tr>, <div> and <tbody> as piecemeal node, weigh for local correlation degree for choice (Local Correlativity) and the context dependent degree (Contextual Correlativity) of a piece.Local correlation degree determines by piece internal chaining and content, and its computing formula can be expressed as:
Wherein, ContentLength and LinkCount represent respectively word number and the link number in piece, j sub-block of expression.
Context dependent degree determines by piece internal chaining and father's piece content, and its computing formula can be expressed as:
Wherein, STU
pirepresent STU
ifather node.
The design's regulation local correlation degree threshold value is 2, and the threshold value of context dependent degree is 70.Html web page is carried out to subject information extraction.
C.3 HTML is transformed to WML
In the time that HTML piece transforms, first to remove the element that WML cannot process, as labels such as <style>, <front>, <script>.Then, set up the mapping table that html tag and WML label transform, according to relation list, HTML is converted into the readable WML of mobile phone, subject information is changed into the WML page that mobile phone can be identified, larger for word length, the text cannot a screen display on mobile phone showing, also needs to carry out paging processing, and deposits in the snapshot storehouse of WML.
D, design WAP interface
WAP interface is the man-machine interaction query interface taking mobile phone as carrier.Adopt WML or xHTML language design.Content on design WAP is as far as possible terse: in the list page of Search Results, entry number is no more than at most ten.In the time that user passes through WAP interface Query Information, first searcher carries out word segmentation processing to the information of user's input, and retrieve all records that comprise term, by calculating webpage weight and correlativity, query note is sorted, carry out set operation, the summary info that finally extracts each webpage feeds back to inquiring user.But in the time that user clicks bar record and watches concrete webpage, different from internet search engine, system can directly not link this webpage on internet, but links the corresponding WML snapshots of web pages of this webpage.
The method for designing according to the present invention, has developed service for life field mobile search engine www.zhaocha.mobi.It is to improve on the basis of original internet search engine www.zhaocha.com.cn, realizes effect as shown in Figure 2.
Claims (1)
1. a mobile search engine optimization method, is characterized in that: comprise the following steps:
A, design mobile search engine framework
Comprise the following steps: this search engine framework is also made up of searcher, index, searcher, four parts of user interface, also has mobile module, as mobile search engine,
It comprises three parts:
Translater, the HTML page that spider is captured is converted into WML page;
WML snapshots of web pages storehouse, preserves the WML page after transforming;
WAP interface, by the user interface of mobile phone access;
B, set up url list
Deposit the webpage grabbing in web page library, and all hyperlink on webpage are deposited in url list;
C, editor's translater
Translater has home page filter, subject information filters and three parts of translation;
C.1 home page filter
First catalog page is filtered, will not translate, count and the ratio that links number according to the text section of webpage, carry out the character of paging, deposit index database in;
C.2 subject information filters
Extract the Topic relative part of webpage, select the tree-model of the STU-DOM that does not rely on information source,
Using the table of webpage, tr, div and tbody label node as piecemeal node, for the local correlation degree Local Correlativity for choice of a piece) and context dependent degree Contextual Correlativity weigh; Local correlation degree determines by piece internal chaining and content, and its computing formula can be expressed as:
Wherein, ContentLength and LinkCount represent respectively word number and the link number in piece, j sub-block of expression;
Context dependent degree determines by piece internal chaining and father's piece content, and its computing formula can be expressed as:
Wherein, STU
pirepresent STU
ifather node;
The design's regulation local correlation degree threshold value is 2, and the threshold value of context dependent degree is 70;
C3. HTML is transformed to WML:
In the time that HTML piece transforms, first to remove the element that WML cannot process, as labels such as style, front, script; Then, set up the mapping table that html tag and WML label transform, according to relation list, HTML be converted into the readable WML of mobile phone,
The text cannot a screen display on mobile phone showing, need to carry out paging processing, and deposit in the snapshot storehouse of WML;
D, design WAP interface
WAP interface is the man-machine interaction query interface taking mobile phone as carrier; Adopt WML or xHTML language design; Content on design WAP is as far as possible terse: in the list page of Search Results, entry number is no more than at most ten.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201210491498.7A CN103838797A (en) | 2012-11-27 | 2012-11-27 | Method for optimizing mobile search engine |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201210491498.7A CN103838797A (en) | 2012-11-27 | 2012-11-27 | Method for optimizing mobile search engine |
Publications (1)
Publication Number | Publication Date |
---|---|
CN103838797A true CN103838797A (en) | 2014-06-04 |
Family
ID=50802306
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201210491498.7A Pending CN103838797A (en) | 2012-11-27 | 2012-11-27 | Method for optimizing mobile search engine |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN103838797A (en) |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106802914A (en) * | 2016-12-06 | 2017-06-06 | 中国电子科技集团公司第三十二研究所 | Heuristic multi-feature rule set webpage blocking method |
CN107807937A (en) * | 2016-09-09 | 2018-03-16 | 阿里巴巴集团控股有限公司 | A kind of website SEO processing methods, apparatus and system |
CN108062338A (en) * | 2016-11-09 | 2018-05-22 | 北京国双科技有限公司 | A kind of method and device of the homing capability of the evaluation function page |
CN113641884A (en) * | 2021-08-10 | 2021-11-12 | 南方电网数字电网研究院有限公司 | Semantic-based power metering data processing method and device and computer equipment |
CN113835740A (en) * | 2021-11-29 | 2021-12-24 | 山东捷瑞数字科技股份有限公司 | Search engine optimization-oriented automatic front-end code repairing method |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101908071A (en) * | 2010-08-10 | 2010-12-08 | 厦门市美亚柏科信息股份有限公司 | Method and device thereof for improving search efficiency of search engine |
CN102156742A (en) * | 2011-04-19 | 2011-08-17 | 北京神州数码思特奇信息技术股份有限公司 | Method and middleware for supporting structured document display with own browser of mobile phone |
CN102325225A (en) * | 2011-09-20 | 2012-01-18 | 北京鹏润鸿途科技有限公司 | Method and device for playing video of mobile phone website |
-
2012
- 2012-11-27 CN CN201210491498.7A patent/CN103838797A/en active Pending
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101908071A (en) * | 2010-08-10 | 2010-12-08 | 厦门市美亚柏科信息股份有限公司 | Method and device thereof for improving search efficiency of search engine |
CN102156742A (en) * | 2011-04-19 | 2011-08-17 | 北京神州数码思特奇信息技术股份有限公司 | Method and middleware for supporting structured document display with own browser of mobile phone |
CN102325225A (en) * | 2011-09-20 | 2012-01-18 | 北京鹏润鸿途科技有限公司 | Method and device for playing video of mobile phone website |
Non-Patent Citations (1)
Title |
---|
汲业等: "《一种移动搜索引擎设计与实现》", 《计算机应用与软件》 * |
Cited By (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107807937A (en) * | 2016-09-09 | 2018-03-16 | 阿里巴巴集团控股有限公司 | A kind of website SEO processing methods, apparatus and system |
CN107807937B (en) * | 2016-09-09 | 2021-11-30 | 阿里巴巴集团控股有限公司 | Website SEO processing method, device and system |
CN108062338A (en) * | 2016-11-09 | 2018-05-22 | 北京国双科技有限公司 | A kind of method and device of the homing capability of the evaluation function page |
CN108062338B (en) * | 2016-11-09 | 2020-06-19 | 北京国双科技有限公司 | Method and device for evaluating navigation capability of function page |
CN106802914A (en) * | 2016-12-06 | 2017-06-06 | 中国电子科技集团公司第三十二研究所 | Heuristic multi-feature rule set webpage blocking method |
CN113641884A (en) * | 2021-08-10 | 2021-11-12 | 南方电网数字电网研究院有限公司 | Semantic-based power metering data processing method and device and computer equipment |
CN113835740A (en) * | 2021-11-29 | 2021-12-24 | 山东捷瑞数字科技股份有限公司 | Search engine optimization-oriented automatic front-end code repairing method |
CN113835740B (en) * | 2021-11-29 | 2022-02-22 | 山东捷瑞数字科技股份有限公司 | Search engine optimization-oriented automatic front-end code repairing method |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN102930059B (en) | Method for designing focused crawler | |
RU2522103C2 (en) | Update notification method and browser | |
CN102043834B (en) | Method for realizing searching by utilizing client and search client | |
CN102708174B (en) | Method and device for displaying rich media information in browser | |
CN100476830C (en) | Network resource searching method and system | |
CN102521251A (en) | Method for directly realizing personalized search, device for realizing method, and search server | |
CN102760151B (en) | Implementation method of open source software acquisition and searching system | |
CN104063454A (en) | Search push method and device for mining user demands | |
CN103428076A (en) | Method and device for transmitting information to multi-type terminals or applications | |
CN101291304A (en) | Transplantable network information sharing method | |
CN102521232B (en) | Distributed acquisition and processing system and method of internet metadata | |
CN103309884A (en) | User behavior data collecting method and system | |
CN103838797A (en) | Method for optimizing mobile search engine | |
CN102117331B (en) | Video search method and system | |
CN102193798B (en) | Method for automatically acquiring Open application programming interface (API) based on Internet | |
CN102750352A (en) | Method and device for classified collection of historical access records in browser | |
CN102722501A (en) | Search engine and realization method thereof | |
CN102722499A (en) | Search engine and implementation method thereof | |
CN104252348A (en) | Webpage access statistics method and device based on browser | |
CN103389972A (en) | Method and device for obtaining text based on really simple syndication (RSS) | |
CN104090923A (en) | Method and device for displaying rich media information in browser | |
CN103970800A (en) | Method and system for extracting and processing webpage related keywords | |
CN100504877C (en) | Method and device for collecting web page action | |
CN101008946A (en) | Search method of Chinese mobile communication information and device thereof | |
CN101133415A (en) | Server, method and system for providing information search service by using sheaf of pages |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20140604 |