CN101887421A - Technology for converting unformatted data into formatted data in web website - Google Patents

Technology for converting unformatted data into formatted data in web website Download PDF

Info

Publication number
CN101887421A
CN101887421A CN2009100840662A CN200910084066A CN101887421A CN 101887421 A CN101887421 A CN 101887421A CN 2009100840662 A CN2009100840662 A CN 2009100840662A CN 200910084066 A CN200910084066 A CN 200910084066A CN 101887421 A CN101887421 A CN 101887421A
Authority
CN
China
Prior art keywords
data
web page
web
formatted data
webpage
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN2009100840662A
Other languages
Chinese (zh)
Inventor
贾鹏
汤海京
朱红军
蒋海涛
田耘
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
BEIJING BOYUESHIJI TECHNOLOGY Co Ltd
Original Assignee
BEIJING BOYUESHIJI TECHNOLOGY Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by BEIJING BOYUESHIJI TECHNOLOGY Co Ltd filed Critical BEIJING BOYUESHIJI TECHNOLOGY Co Ltd
Priority to CN2009100840662A priority Critical patent/CN101887421A/en
Publication of CN101887421A publication Critical patent/CN101887421A/en
Pending legal-status Critical Current

Links

Abstract

The invention discloses technology for converting unformatted data into formatted data in a web website. The technology comprises the following steps of: establishing a webpage content aggregation service, and capturing the specified webpage; then establishing a website intelligent resolution service and a website semantic pattern base; and purifying heterogeneous website data to obtain standard formatted data. The technology greatly reduces the cost of enterprises and improves the reliability of webpage information transplanting by uniformly converting multiple kinds of heterogeneous information in the web website into the formatted data which is convenient to store and is easy to screen.

Description

A kind of technology that nonformatted data in the web website is transferred to formatted data
Technical field
The present invention relates to a kind ofly transfer nonformatted data in the web website to the technology of formatted data, particularly a kind of technology that adopts the Web page intelligent Grasp Modes all kinds of isomery information unification in the web page to be converted into formatted data.
Background technology
Along with Chinese cellphone subscriber's the explosive growth and the fast development of mobile Internet industry.Increasing people uses surfing Internet with cell phone.After 3G license was provided, layout was being carried out for mobile Internet business in three big operation commercial cities.All kinds of WAP site are equally flourish like rain the back spring bamboo.A lot of traditional internet sites all wish own original content transplanting by cell phone network user's growth, to continue to keep development in mobile Internet.The construction of current WAP website, most dependence is set up special project team and is carried out the WAP program development.This wherein needs to solve data integration, content release and management, real-time update, numerous complicated problems such as system's operation and maintenance.Expending relative higher cost just can finish the foundation of WAP website and integrate original WEB site contents.Foundation of the present invention has realized need not to introduce the technical of other development sequences, and the web data of isomery is automatically converted to the unified format data, is convenient to depositing and screening of data.
Summary of the invention
Technical matters to be solved by this invention provides a kind of method that all kinds of isomery information unification in the web page is converted into formatted data by Web page intelligent extracting technology.
Of the present inventionly a kind ofly transfer nonformatted data in the web website technology of formatted data to, this method comprises following step at least:
Step 1: set up the web page contents aggregated service, grasp the webpage of appointment.
Step 2: set up Web page intelligent analysis service and web page semantics library.
Step 3: the formatted data that the web data purification of isomery is become standard.
Of the present inventionly a kind ofly transfer nonformatted data in the web website technology of formatted data to, set up the web page contents aggregated service, grasp the webpage of appointment, the steps include:
Step 11: set up the web page contents polymerization procedure, can download the content of named web page.
Step 12: all kinds of isomery information on the targeted internet webpage are grasped comprehensively.
Of the present inventionly a kind ofly transfer nonformatted data in the web website technology of formatted data to, set up Web page intelligent analysis service and web page semantics library, the steps include:
Step 21: set up the web page semantics library, preserve the semantic pattern of named web page.
Step 22: the information after will grasping is classified by semantic pattern, warehouse-in.
Of the present inventionly a kind ofly transfer nonformatted data in the web website technology of formatted data to, the web data of isomery is purified becomes the formatted data of standard, the steps include:
Step 31: set up the Web page intelligent analysis service, the content-aggregated webpage that obtains is carried out the WEB purification processes.
Step 32: for index pages, the data that obtain after the purification are tabulations of web page entry information.Need to continue the text of each web page entry of polymerization.For the text page, obtaining after the purification is title, time, text and the picture concerned of webpage.
Step 33: the formatted data after the purification is saved in the content library.
The present invention relates to a kind of technology that nonformatted data in the web website is transferred to formatted data, make all kinds of isomery information unification in the web page be converted into conveniently the formatted data of depositing, being easy to screen, greatly reduce the cost of enterprise, improved the reliability that the web info web is transplanted.
Embodiment
Key step of the present invention is as follows:
Step 1: set up the web page contents aggregated service, grasp the webpage of appointment.
Step 2: set up Web page intelligent analysis service and web page semantics library.
Step 3: the formatted data that the web data purification of isomery is become standard.
Above-mentioned steps 1 is set up the web page contents aggregated service, and the concrete steps of the webpage of extracting appointment are as follows:
Step 11: set up the web page contents polymerization procedure, can download the content of named web page.
Step 12: all kinds of isomery information on the targeted internet webpage are grasped comprehensively.
The concrete steps that above-mentioned steps 2 is set up Web page intelligent analysis service and web page semantics library are as follows:
Step 21: set up the web page semantics library, preserve the semantic pattern of named web page.
Step 22: the information after will grasping is classified by semantic pattern, warehouse-in.
Above-mentioned steps 3 is as follows with the purify concrete steps of the formatted data that becomes standard of the web data of isomery:
Step 31: set up the Web page intelligent analysis service, the content-aggregated webpage that obtains is carried out the WEB purification processes.
Step 32: for index pages, the data that obtain after the purification are tabulations of web page entry information.Need to continue the text of each web page entry of polymerization.For the text page, obtaining after the purification is title, time, text and the picture concerned of webpage.
Step 33: the formatted data after the purification is saved in the content library.
Above-mentioned technical scheme makes all kinds of isomery information unification in the web page be converted into conveniently the formatted data of depositing, being easy to screen when implementing, and greatly reduces the cost of enterprise, has improved the reliability that the web info web is transplanted.
It should be noted last that above embodiment is only in order to explanation and unrestricted technical scheme described in the invention; Therefore, although this instructions has been described in detail the present invention with reference to the above embodiments,, those of ordinary skill in the art should be appreciated that still and can make amendment or replacement to the present invention with being equal to; And all do not break away from the technical scheme and the improvement thereof of the spirit and scope of the present invention, and it all should be encompassed in the middle of the claim scope of the present invention.

Claims (4)

1. one kind transfers nonformatted data in the web website technology of formatted data to, and it is characterized in that: this method comprises following step at least:
Step 1: set up the web page contents aggregated service, grasp the webpage of appointment.
Step 2: set up Web page intelligent analysis service and web page semantics library.
Step 3: the formatted data that the web data purification of isomery is become standard.
2. according to claim 1ly a kind ofly transfer nonformatted data in the web website technology of formatted data to, it is characterized in that: set up the web page contents aggregated service, grasp the webpage of appointment, the steps include:
Step 11: set up the web page contents polymerization procedure, can download the content of named web page.
Step 12: all kinds of isomery information on the targeted internet webpage are grasped comprehensively.
3. according to claim 1ly a kind ofly transfer nonformatted data in the web website technology of formatted data to, it is characterized in that: set up Web page intelligent analysis service and web page semantics library, the steps include:
Step 21: set up the web page semantics library, preserve the semantic pattern of named web page.
Step 22: the information after will grasping is classified by semantic pattern, warehouse-in.
4. according to claim 1ly a kind ofly transfer nonformatted data in the web website technology of formatted data to, it is characterized in that: the web data of isomery is purified becomes the formatted data of standard, the steps include:
Step 31: set up the Web page intelligent analysis service, the content-aggregated webpage that obtains is carried out the WEB purification processes.
Step 32: for index pages, the data that obtain after the purification are tabulations of web page entry information.Need to continue the text of each web page entry of polymerization.For the text page, obtaining after the purification is title, time, text and the picture concerned of webpage.
Step 33: the formatted data after the purification is saved in the content library.
CN2009100840662A 2009-05-13 2009-05-13 Technology for converting unformatted data into formatted data in web website Pending CN101887421A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN2009100840662A CN101887421A (en) 2009-05-13 2009-05-13 Technology for converting unformatted data into formatted data in web website

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN2009100840662A CN101887421A (en) 2009-05-13 2009-05-13 Technology for converting unformatted data into formatted data in web website

Publications (1)

Publication Number Publication Date
CN101887421A true CN101887421A (en) 2010-11-17

Family

ID=43073347

Family Applications (1)

Application Number Title Priority Date Filing Date
CN2009100840662A Pending CN101887421A (en) 2009-05-13 2009-05-13 Technology for converting unformatted data into formatted data in web website

Country Status (1)

Country Link
CN (1) CN101887421A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103577482A (en) * 2012-08-07 2014-02-12 腾讯科技(深圳)有限公司 Web page collecting method and device as well as browser
CN112860520A (en) * 2021-02-23 2021-05-28 合肥大多数信息科技有限公司 Information data formatting assembly based on artificial intelligence

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103577482A (en) * 2012-08-07 2014-02-12 腾讯科技(深圳)有限公司 Web page collecting method and device as well as browser
CN103577482B (en) * 2012-08-07 2017-12-15 腾讯科技(深圳)有限公司 A kind of webpage collection method, device and browser
CN112860520A (en) * 2021-02-23 2021-05-28 合肥大多数信息科技有限公司 Information data formatting assembly based on artificial intelligence

Similar Documents

Publication Publication Date Title
CN101500145A (en) Digital household public transportation enquiry system based on bi-directional set-top box
CN101872463A (en) Method for issuing advertisement through third-party platform
CN102906747A (en) Method and apparatus for portable index on removable storage medium
CN102521232A (en) Distributed acquisition and processing system and method of internet metadata
CN102148887A (en) Method for displaying region and weather of cellphone incoming call contacter
CN101661666A (en) Taxi call center system and implementation method thereof
CN101887421A (en) Technology for converting unformatted data into formatted data in web website
Tsai et al. Analysis of knowledge management trend by bibliometric approach
CN101888611A (en) Technology for intelligently scaling picture in internet website for mobile terminal to display
CN101645059A (en) Travel establishing method and mobile terminal
CN101261635B (en) Passive type network information automatic highly effective collection system and method
CN103389993A (en) Network information obtaining method and system for mobile equipment
CN101882133A (en) Method for converting WEB page frames into WAP page frames
CN101072252A (en) Method and device for identifying mobile phone number territoriality for mobile communication terminal
CN101887426A (en) Technology for converting headline in web website into headline in wap website
CN101887422A (en) Technique for keeping synchronous update of data of web site and wap site
CN101610284B (en) Service parameter relational matching method and system based on calling data
CN102611778A (en) System and method for managing contact persons based on number location of contact persons of mobile phone
CN101887423A (en) Technology for implanting web site information to wap by taking key word as weight
CN101887420A (en) Technology for implanting web site information to wap by taking time as weight
CN1645857A (en) Network information appointing communicating method
CN101887431A (en) Technology for converting text in web website into text in wap website
CN101887430A (en) Technology for converting audio in web website into audio in wap website
CN102637177A (en) Characteristic method for browsing webpages on mobile phones
CN101887428A (en) Technology for converting video in web website into video in wap website

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C02 Deemed withdrawal of patent application after publication (patent law 2001)
WD01 Invention patent application deemed withdrawn after publication

Application publication date: 20101117