CN102857575A - Download method and system for Internet resources - Google Patents
Download method and system for Internet resources Download PDFInfo
- Publication number
- CN102857575A CN102857575A CN201210353411XA CN201210353411A CN102857575A CN 102857575 A CN102857575 A CN 102857575A CN 201210353411X A CN201210353411X A CN 201210353411XA CN 201210353411 A CN201210353411 A CN 201210353411A CN 102857575 A CN102857575 A CN 102857575A
- Authority
- CN
- China
- Prior art keywords
- module
- resource
- browser
- download
- artificial
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Images
Landscapes
- Information Transfer Between Computers (AREA)
Abstract
The invention relates to the field of the Internet and provides a download method for Internet resources. The method comprises the following steps: requesting a download task of a resource and the description information of the resource from a downloading module by an analyzing module; scheduling a simulative artificial browser module to simulate an artificial web browsing, loading a packet intercepting module to obtain resource request information and sending the resource request information obtained by the packet intercepting module to the download module by the analyzing module; and downloading the resource according to the resource request information provided by the analyzing module by the downloading module. The invention further provides a download system of the Internet resources. Through the technical scheme provided by the invention, audio and video files on the Internet are automatically treated in batches respectively via a distributed system, so that the download efficiency of the audio and video files is greatly improved; and compared with the traditional download technology of the Internet resources, the download method and the download system has the advantages that the ratio for successfully downloading real download addresses of the audio and video files from the Internet is greatly improved.
Description
Technical field
The present invention relates to internet arena, particularly relate to a kind of method for down loading and system of Internet resources.
Background technology
Along with the fast development of the Internet, information has arrived the huge explosion epoch, and the instrument that search engine has also become people to be unable to do without, and all information of search engine are from the Internet.In the information of magnanimity, audio frequency, video file is to gather, yet many audio frequency, obtaining of video file can not rely on simple hyperlink to obtain, because most of resource website has all added the strategies (cookie is for example arranged, and the http head has been done special setting etc.) such as door chain, and is only fewer and feweri according to the resource file that hyperlink can download to.
In order better to obtain Internet resources, particularly audio frequency, video file, therefore, demand a kind of new technology urgently and this difficult problem occurs cracking.
Summary of the invention
Main purpose of the present invention is to propose a kind of method for down loading and system of Internet resources, and many audio frequency, obtaining of video file can not be obtained and the download efficiency problem by simple hyperlink in the prior art to solve.
For addressing the above problem, the invention provides a kind of method for down loading of Internet resources, comprise,
Parsing module is to the descriptor of resource downloading task of download module request and this resource;
The artificial browser module of parsing module dispatching simulation is simulated artificial browsing page, load to cut simultaneously a bag module Gains resources solicited message, and will cut the resource request information that the bag module obtains and send to download module;
The resource request information that download module provides according to parsing module is downloaded this resource.
Further,, also comprise before the descriptor of resource downloading task of download module request and this resource at parsing module,
Download module is to the descriptor of a collection of resource downloading task of database request and this resource.
Further, after the resource request information that download module provides according to parsing module is downloaded this resource, also comprise,
The resource that download module is downloaded is stored to database.
In the above-mentioned method, wherein, described resource comprises video, audio frequency and the lyrics; The descriptor of described resource comprise broadcast page URL, the video of audio frequency broadcast page URL, the lyrics browse a page URL, resource type, whether mouse click, browser type; Described resource type comprises audio frequency, video, the lyrics; Described browser type comprises IE browser, chrom browser; Described resource request information comprises http request header, URL and the resource task id of resource.
In the above-mentioned method, wherein, the artificial browser module of described parsing module dispatching simulation is simulated artificial browsing page, loads simultaneously to cut bag module Gains resources solicited message, specifically comprise,
Artificial browser module is simulated in initialization;
Set up User Datagram Protoco (UDP) udp service and cut the bag module communication;
Parsing module is determined the analog form of the artificial browser module of simulation and is cut the load mode of bag module according to browser type in the descriptor of resource;
Simulate artificial browser module simulation and manually browse page URL that browses of the broadcast page URL of the broadcast page URL of audio frequency, video, the lyrics;
Simultaneously, parsing module loads and cuts bag module Gains resources solicited message.
In the above-mentioned method, wherein, described parsing module determines that according to browser type in the descriptor of resource the analog form of the artificial browser module of simulation and the load mode of section bag module specifically comprise,
If browser type is the IE browser in the descriptor of resource, parsing module determines that the analog form of the artificial browser module of simulation is the IE analog form, the IE browser is manually browsed in i.e. simulation, utilizes simultaneously the Detour of Microsoft assembly to inject the artificial browser module of simulation cutting the bag module;
If browser type is the chrom browser in the descriptor of resource, parsing module determines that the analog form of the artificial browser module of simulation is the chrom analog form.
In the above-mentioned method, wherein, described section bag module Gains resources solicited message specifically comprise,
Intercept and capture each http request header, and record http request header, URL and socketID; If comprise .wma/.MP3 among the URL, then with this http request header and URL resource request information by default;
Intercept and capture each http head response, if the content of head response comprises audio frequency sign or video sign behind Content-Type, then the http request header that this socketID is corresponding and URL are as required resource request information; Otherwise, if there is the resource request information of acquiescence, with the resource request information of acquiescence as required resource request information; Otherwise, cut the failure of bag module Gains resources solicited message;
Cut the bag module resource request information of obtaining is sent to parsing module.
The present invention also provides a kind of download system of Internet resources, comprises,
Download module is used for to the descriptor of a collection of resource downloading task of database request and this resource, and is used for downloading this resource according to the resource request information that parsing module provides;
Parsing module, be used for to the descriptor of resource downloading task of download module request and this resource, and the artificial browser module of dispatching simulation is simulated artificial browsing page, load to cut simultaneously a bag module Gains resources solicited message, and will cut the resource request information that the bag module obtains and send to download module;
Simulate artificial browser module, be used for simulating artificial browsing page;
Cut the bag module, be used for the Gains resources solicited message, and the resource request information of obtaining is sent to parsing module.
Further, above-mentioned download system also comprises,
Database is used for the resource that the storage download module is downloaded.
Adopt technical scheme of the present invention, automatically the audio frequency on the Internet, video file are carried out the distributed system batch process, greatly improved the download efficiency of audio frequency, video file; The download technology of relatively existing Internet resources has improved the ratio that audio frequency, the real download address of video file and success are downloaded of directly obtaining from the Internet greatly.
Description of drawings
Accompanying drawing described herein is used to provide a further understanding of the present invention, consists of a part of the present invention, and illustrative examples of the present invention and explanation thereof are used for explaining the present invention, do not consist of improper restriction of the present invention.In the accompanying drawings:
Fig. 1 is the flow chart of first embodiment of the invention;
Fig. 2 is the system construction drawing of second embodiment of the invention.
Embodiment
In order to make technical problem to be solved by this invention, technical scheme and beneficial effect clearer, clear, below in conjunction with drawings and Examples, the present invention is further elaborated.Should be appreciated that specific embodiment described herein only in order to explain the present invention, is not intended to limit the present invention.
As shown in Figure 1, be the first embodiment of the invention flow chart, a kind of method for down loading of Internet resources is provided, specifically comprise,
Step S101, parsing module is to the descriptor of resource downloading task of download module request and this resource;
Particularly, described resource comprises video, audio frequency and the lyrics; The descriptor of described resource comprise broadcast page URL, the video of audio frequency broadcast page URL, the lyrics browse a page URL, resource type, whether mouse click, browser type; Described resource type comprises audio frequency, video, the lyrics; Described browser type comprises IE browser, chrom browser.
All resources of search engine are from the Internet, so must in the information of internet mass, gather the audio frequency and video file.
Particularly, before the descriptor of resource downloading task of download module request and resource, download module is to the descriptor of a collection of resource downloading task of database request and this resource at parsing module.
Step S102, the artificial browser module of parsing module dispatching simulation is simulated artificial browsing page, load to cut simultaneously a bag module Gains resources solicited message, and will cut the resource request information that the bag module obtains and send to download module; Described resource request information comprises http request header, URL and the resource task id of resource.
As an embodiment, specifically comprise,
A. artificial browser module is simulated in initialization;
B. set up User Datagram Protoco (UDP) udp service and cut the bag module communication;
C. parsing module is determined the analog form of the artificial browser of simulation and is cut the load mode of bag module according to browser type in the descriptor of resource;
If browser type is the IE browser in the descriptor of resource, parsing module determines that the analog form of the artificial browser module of simulation is the IE analog form, the IE browser is manually browsed in i.e. simulation, utilizes simultaneously the Detour of Microsoft assembly to inject the artificial browser module of simulation cutting the bag module;
If browser type is the chrom browser in the descriptor of resource, the analog form that parsing module will determine to simulate artificial browser module is the chrom analog form.
D. simulate artificial browser module simulation and manually browse page URL that browses of the broadcast page URL of the broadcast page URL of audio frequency, video, the lyrics;
E. simultaneously, parsing module loads and cuts bag module Gains resources solicited message,
1) intercept and capture each http request header, and record http request header, URL and socketID; If comprise .wma or .MP3 among the URL, then with this http request header and URL resource request information by default;
2) intercept and capture each http head response, if the content of head response comprises audio frequency sign or video sign behind Content-Type, then the http request header that this socketID is corresponding and URL are as required resource request information; Otherwise, if there is the resource request information of acquiescence, with the resource request information of acquiescence as required resource request information; Otherwise, cut the failure of bag module Gains resources solicited message;
3) cut the bag module resource request information of obtaining is sent to parsing module.
Step S103, the resource request information that download module provides according to parsing module is downloaded this resource.
Particularly, after the resource request information that download module provides according to parsing module was downloaded this resource, the resource that download module is downloaded was stored to database.
As shown in Figure 2, the present invention also provides a kind of download system of Internet resources, comprise,
Parsing module 202, be used for to the descriptor of resource downloading task of download module request and this resource, and the artificial browser module of dispatching simulation is simulated artificial browsing page, load to cut simultaneously a bag module Gains resources solicited message, and will cut the resource request information that the bag module obtains and send to download module;
Simulate artificial browser module 203, be used for simulating artificial browsing page;
Cut bag module 204, be used for the Gains resources solicited message, and the resource request information of obtaining is sent to parsing module.
Further, above-mentioned download system also comprises,
Adopt above-mentioned technical scheme, automatically the audio frequency on the Internet, video file are carried out the distributed system batch process, greatly improved the download efficiency of audio frequency, video file; The download technology of relatively existing Internet resources has greatly improved and has directly obtained the ratio that audio frequency, the real download address of video file and success are downloaded from network.
Above-mentioned explanation illustrates and has described a preferred embodiment of the present invention, but as previously mentioned, be to be understood that the present invention is not limited to the disclosed form of this paper, should not regard the eliminating to other embodiment as, and can be used for various other combinations, modification and environment, and can in invention contemplated scope described herein, change by technology or the knowledge of above-mentioned instruction or association area.And the change that those skilled in the art carry out and variation do not break away from the spirit and scope of the present invention, then all should be in the protection range of claims of the present invention.
Claims (9)
1. the method for down loading of Internet resources is characterized in that, comprise,
Parsing module is to the descriptor of resource downloading task of download module request and this resource;
The artificial browser module of parsing module dispatching simulation is simulated artificial browsing page, load to cut simultaneously a bag module Gains resources solicited message, and will cut the resource request information that the bag module obtains and send to download module;
The resource request information that download module provides according to parsing module is downloaded this resource.
2. method for down loading according to claim 1 is characterized in that,, also comprise before the descriptor of resource downloading task of download module request and this resource at parsing module,
Download module is to the descriptor of a collection of resource downloading task of database request and this resource.
3. method for down loading according to claim 1 is characterized in that, after the resource request information that download module provides according to parsing module is downloaded this resource, also comprise,
The resource that download module is downloaded is stored to database.
4. according to claim 1 to 3 arbitrary described method for down loading, it is characterized in that, described resource comprises video, audio frequency and the lyrics; The descriptor of described resource comprise broadcast page URL, the video of audio frequency broadcast page URL, the lyrics browse a page URL, resource type, whether mouse click, browser type; Described resource type comprises audio frequency, video, the lyrics; Described browser type comprises IE browser, chrom browser; Described resource request information comprises http request header, URL and the resource task id of resource.
5. according to claim 1 to 3 arbitrary described method for down loading, it is characterized in that, the artificial browser module of described parsing module dispatching simulation is simulated artificial browsing page, loads simultaneously to cut bag module Gains resources solicited message, specifically comprise,
Artificial browser module is simulated in initialization;
Set up User Datagram Protoco (UDP) udp service and cut the bag module communication;
Parsing module is determined the analog form of the artificial browser module of simulation and is cut the load mode of bag module according to browser type in the descriptor of resource;
Simulate artificial browser module simulation and manually browse page URL that browses of the broadcast page URL of the broadcast page URL of audio frequency, video, the lyrics, simultaneously, parsing module loads and cuts bag module Gains resources solicited message.
6. method for down loading according to claim 5 is characterized in that, described parsing module determines that according to browser type in the descriptor of resource the analog form of the artificial browser module of simulation and the load mode of section bag module specifically comprise,
If browser type is the IE browser in the descriptor of resource, parsing module determines that the analog form of the artificial browser module of simulation is the IE analog form, the IE browser is manually browsed in i.e. simulation, utilizes simultaneously the Detour of Microsoft assembly to inject the artificial browser module of simulation cutting the bag module;
If browser type is the chrom browser in the descriptor of resource, parsing module determines that the analog form of the artificial browser module of simulation is the chrom analog form.
7. method for down loading according to claim 5 is characterized in that, described section bag module Gains resources solicited message specifically comprise,
Intercept and capture each http request header, and record http request header, URL and socketID; If comprise .wma or .MP3 among the URL, then with this http request header and URL resource request information by default;
Intercept and capture each http head response, if the content of head response comprises audio frequency sign or video sign behind Content-Type, then the http request header that this socketID is corresponding and URL are as required resource request information; Otherwise, if there is the resource request information of acquiescence, with the resource request information of acquiescence as required resource request information; Otherwise, cut the failure of bag module Gains resources solicited message.
8. the download system of Internet resources is characterized in that, comprise,
Download module is used for to the descriptor of a collection of resource downloading task of database request and this resource, and downloads this resource according to the resource request information that parsing module provides;
Parsing module, be used for to the descriptor of resource downloading task of download module request and this resource, and the artificial browser module of dispatching simulation is simulated artificial browsing page, load to cut simultaneously a bag module Gains resources solicited message, and will cut the resource request information that the bag module obtains and send to download module;
Simulate artificial browser module, be used for simulating artificial browsing page;
Cut the bag module, be used for the Gains resources solicited message, and the resource request information of obtaining is sent to parsing module.
9. download system according to claim 8 is characterized in that, also comprise,
Database is used for the resource that the storage download module is downloaded.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201210353411.XA CN102857575B (en) | 2012-09-21 | 2012-09-21 | The method for down loading of a kind of Internet resources and system |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201210353411.XA CN102857575B (en) | 2012-09-21 | 2012-09-21 | The method for down loading of a kind of Internet resources and system |
Publications (2)
Publication Number | Publication Date |
---|---|
CN102857575A true CN102857575A (en) | 2013-01-02 |
CN102857575B CN102857575B (en) | 2016-12-21 |
Family
ID=47403763
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201210353411.XA Active CN102857575B (en) | 2012-09-21 | 2012-09-21 | The method for down loading of a kind of Internet resources and system |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN102857575B (en) |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105141638A (en) * | 2015-09-29 | 2015-12-09 | 北京奇艺世纪科技有限公司 | Method and device for downloading video resources |
CN107888940A (en) * | 2016-09-30 | 2018-04-06 | 法乐第(北京)网络科技有限公司 | Video and its related resource method for down loading and system |
CN110392022A (en) * | 2018-04-19 | 2019-10-29 | 阿里巴巴集团控股有限公司 | A kind of network resource access method, computer equipment, storage medium |
CN111404898A (en) * | 2020-03-06 | 2020-07-10 | 北京创世云科技有限公司 | Anti-stealing-link method and device, storage medium and electronic equipment |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102360349A (en) * | 2011-07-21 | 2012-02-22 | 深圳市万兴软件有限公司 | Method and device for acquiring audio/video link address in webpage |
CN102510536A (en) * | 2011-12-21 | 2012-06-20 | 中国传媒大学 | Method for downloading videos and audios of internet |
-
2012
- 2012-09-21 CN CN201210353411.XA patent/CN102857575B/en active Active
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102360349A (en) * | 2011-07-21 | 2012-02-22 | 深圳市万兴软件有限公司 | Method and device for acquiring audio/video link address in webpage |
CN102510536A (en) * | 2011-12-21 | 2012-06-20 | 中国传媒大学 | Method for downloading videos and audios of internet |
Cited By (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105141638A (en) * | 2015-09-29 | 2015-12-09 | 北京奇艺世纪科技有限公司 | Method and device for downloading video resources |
CN105141638B (en) * | 2015-09-29 | 2018-08-03 | 北京奇艺世纪科技有限公司 | A kind of method for down loading and device of video resource |
CN107888940A (en) * | 2016-09-30 | 2018-04-06 | 法乐第(北京)网络科技有限公司 | Video and its related resource method for down loading and system |
CN110392022A (en) * | 2018-04-19 | 2019-10-29 | 阿里巴巴集团控股有限公司 | A kind of network resource access method, computer equipment, storage medium |
CN110392022B (en) * | 2018-04-19 | 2022-04-05 | 阿里巴巴集团控股有限公司 | Network resource access method, computer equipment and storage medium |
CN111404898A (en) * | 2020-03-06 | 2020-07-10 | 北京创世云科技有限公司 | Anti-stealing-link method and device, storage medium and electronic equipment |
Also Published As
Publication number | Publication date |
---|---|
CN102857575B (en) | 2016-12-21 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN104301436B (en) | Content to be displayed push, subscription, update method and its corresponding device | |
CN101247402B (en) | Multimedia files downloading and broadcasting system and method | |
CN102333122B (en) | Downloaded resource provision method, device and system | |
CN103384275B (en) | Cross-terminal downloading method, system cloud server and terminal | |
US8180376B1 (en) | Mobile analytics tracking and reporting | |
CN105045887B (en) | The system and method for mixed mode cross-domain data interaction | |
CN103810176B (en) | A kind of info web prefetches access method and device | |
CN103338249B (en) | Caching method and device | |
CN102393857A (en) | Method and system for local call based on web page | |
CN103685590B (en) | Obtain the method and system of IP address | |
CN104572843B (en) | The loading method and device of a kind of page | |
US20150304412A1 (en) | Browser and system for download and download method | |
US20130117351A1 (en) | Efficient transfer of web content to different user platforms | |
CN108319662A (en) | Page processing method, device, electronic equipment and readable storage medium storing program for executing | |
CN106790593B (en) | Page processing method and device | |
CN102857575A (en) | Download method and system for Internet resources | |
CN103077191B (en) | Adaptive Web platform audio playing method and device | |
CN105635201A (en) | Application starting method and application starting system based on pushed information | |
WO2006038987A3 (en) | A method and apparatus for assigning access control levels in providing access to networked content files | |
CN107370628B (en) | Log processing method and system based on embedded points | |
CN103593233A (en) | Method and system for pushing software information | |
CN102185917A (en) | Method and system for adaptation between server and mobile terminal, and server adaptation device | |
CN104394512A (en) | Message push system | |
CN105100291A (en) | Resource address generating method, device and system | |
CN103607454A (en) | Method for setting private proxy server for Android system browser |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
C14 | Grant of patent or utility model | ||
GR01 | Patent grant | ||
CP03 | Change of name, title or address |
Address after: 518057 C Building 5, Nanshan District software industry base, Shenzhen, Guangdong 403-409, China Patentee after: Shenzhen easou world Polytron Technologies Inc Address before: 518026 Guangdong city of Shenzhen province Futian District Binhe Road and CaiTian Road Interchange Union Square Tower A, A5501-A Patentee before: Shenzhen Yisou Science & Technology Development Co., Ltd. |
|
CP03 | Change of name, title or address |