CN105550179A - Webpage collection method and browser plug-in - Google Patents

Webpage collection method and browser plug-in Download PDF

Info

Publication number
CN105550179A
CN105550179A CN201410594451.2A CN201410594451A CN105550179A CN 105550179 A CN105550179 A CN 105550179A CN 201410594451 A CN201410594451 A CN 201410594451A CN 105550179 A CN105550179 A CN 105550179A
Authority
CN
China
Prior art keywords
communication tool
webpage
immediate communication
collection
label
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201410594451.2A
Other languages
Chinese (zh)
Other versions
CN105550179B (en
Inventor
梁宇
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tencent Technology Shenzhen Co Ltd
Original Assignee
Tencent Technology Shenzhen Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tencent Technology Shenzhen Co Ltd filed Critical Tencent Technology Shenzhen Co Ltd
Priority to CN201410594451.2A priority Critical patent/CN105550179B/en
Publication of CN105550179A publication Critical patent/CN105550179A/en
Application granted granted Critical
Publication of CN105550179B publication Critical patent/CN105550179B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Landscapes

  • Information Transfer Between Computers (AREA)

Abstract

The embodiment of the invention discloses a webpage collection method and a browser plug-in. According to the invention, information of a to-be-collected webpage can be accurately sent to a specific instance under a condition that a user is allowed to operate multiple instances. The method of the embodiment of the invention comprises following steps: receiving a collection instruction executed to a webpage by the user; extracting the webpage information of the webpage according to the collection instruction; receiving the login account information input by the user; building communication connection with an instant messaging tool corresponding to the login account information; and sending the webpage information to the instant messaging tool.

Description

A kind of webpage collection method and browser plug-in
Technical field
The present invention relates to field of computer technology, particularly relate to a kind of webpage collection method and browser plug-in.
Background technology
Browser can the html file content of display web page server or file system, is convenient to user and these files carry out alternately.Wherein, web page storage folder is an application substantially of browser, local computer terminal is kept at for user being needed the web page interlinkage of collecting, so that user just directly can open corresponding webpage by clicking the web page title be carried in web page storage folder, user is facilitated to check interested web page contents at any time.
At present, in order to solve the loss problem of the web page interlinkage being kept at local computer terminal, there has been proposed network profile, for web page interlinkage is saved in network data base, such as impression is taken down notes, is had cloud notes etc., and then, above-mentioned network collection instrument all adopts single instance mode, only allow single example to run, collection data are all sent in this single example by acquiescence.
Above-mentioned single example collection solution is fairly simple, and on this basis, present inventor proposes one and user can be allowed to run Multi-instance, and accurately can be sent to the collection solution of given instance in collection data.
Summary of the invention
Embodiments providing a kind of webpage collection method, when allowing user to run Multi-instance, info web to be collected can be sent in given instance exactly.
The first aspect of the embodiment of the present invention provides a kind of webpage collection method, comprising:
Receive the collection instruction that user performs webpage;
The info web of webpage according to described collection instruction fetch;
Receive the login account information of described user input;
The immediate communication tool corresponding with described login account information establishes a communications link;
Described info web is sent to described immediate communication tool.
The second aspect of the embodiment of the present invention provides a kind of browser plug-in, comprising:
First receiving element, for receiving the collection instruction that user performs webpage;
Extraction unit, for the info web of webpage according to described collection instruction fetch;
Second receiving element, for receiving the login account information of described user input;
Set up unit, establish a communications link for the immediate communication tool corresponding with described login account information;
Transmitting element, for sending described info web to described immediate communication tool.
In the technical scheme that the embodiment of the present invention provides, after receiving the collection instruction that user performs webpage, browser plug-in is according to the info web of this collection instruction fetch webpage, after the login account information receiving user's input, the immediate communication tool corresponding with login account information establishes a communications link, and send above-mentioned info web by this communication connection to this immediate communication tool, in the present embodiment, immediate communication tool allows user to run Multi-instance, and browser plug-in is after the login account information receiving user's input, the immediate communication tool corresponding with this login account information establishes a communications link, in info web to be collected can be made to be sent to exactly immediate communication tool that user specifies.Therefore relative to prior art, info web to be collected when allowing user to run Multi-instance, can be sent in given instance by the embodiment of the present invention exactly.
Accompanying drawing explanation
Fig. 1 is webpage collection method embodiment schematic diagram in the embodiment of the present invention;
Fig. 2 is another embodiment schematic diagram of webpage collection method in the embodiment of the present invention;
Fig. 3 A and Fig. 3 B is the application scenarios schematic diagram of webpage collection method in the embodiment of the present invention;
Fig. 4 is browser plug-in embodiment schematic diagram in the embodiment of the present invention;
Fig. 5 is another embodiment schematic diagram of browser plug-in in the embodiment of the present invention;
Fig. 6 is another embodiment schematic diagram of browser plug-in in the embodiment of the present invention;
Embodiment
Embodiments providing a kind of webpage collection method and browser plug-in, when allowing user to run Multi-instance, info web to be collected can be sent in given instance exactly.Below be described in detail respectively.
Below in conjunction with the accompanying drawing in the embodiment of the present invention, be clearly and completely described the technical scheme in the embodiment of the present invention, obviously, described embodiment is only the present invention's part embodiment, instead of whole embodiments.Based on the embodiment in the present invention, those skilled in the art, not making the every other embodiment obtained under creative work prerequisite, belong to the scope of protection of the invention.
Wherein, example refers to the database program that some energy supporting database are run, and is the concrete expression of object.
First introduce the webpage collection method that the embodiment of the present invention provides below, it should be noted that, the webpage collection method that the embodiment of the present invention provides can be applied to multiple browser instrument, such as, can be based on the browser of IE kernel or the browser based on webkit kernel, specifically not enumerate herein.
Refer to Fig. 1, in the embodiment of the present invention, webpage collection method comprises:
101, the collection instruction that user performs webpage is received;
When user is interested in the current webpage browsed, collection instruction can be performed to webpage, enter the Web page collection flow process to trigger browser plug-in.
102, according to the info web of described collection instruction fetch webpage;
When receiving this collection instruction, browser plug-in is according to the info web of this this webpage of collection instruction fetch.It should be noted that, in the present embodiment, the extraction scope of webpage can be the part webpage in webpage, as text, can be also whole webpage, specifically be not construed as limiting herein, and user can be indicated by the collection scope of above-mentioned collection instruction to webpage.
103, the login account information of user's input is received;
In the present embodiment, after triggering browser plug-in enters the Web page collection flow process, browser plug-in provides account log-in interface for user, so as user's input specify treat that collection is local.
104, corresponding with login account information immediate communication tool establishes a communications link;
In the present embodiment, between login account information and immediate communication tool, there is corresponding relation, just can inquire the immediate communication tool corresponding with this login account information according to this login account information.After the immediate communication tool that browser plug-in is corresponding with login account information establishes a communications link, browser plug-in just can send data and instruction according to corresponding communication protocol to this immediate communication tool.
It should be noted that, above-mentioned steps 102 also can be perform after step 103 and step 104, or, part run in step 102 performs after step 103 and step 104, specifically be not construed as limiting herein, in actual application, there is not sequential between the extraction flow process of info web and the Establishing process of communication connection and limit.
105, info web is sent to immediate communication tool;
After above-mentioned communication connection has been set up, browser plug-in has sent info web by this communication connection to immediate communication tool, to be sent to extracting the info web obtained in the immediate communication tool of specifying.
In the technical scheme that the embodiment of the present invention provides, after receiving the collection instruction that user performs webpage, browser plug-in is according to the info web of this collection instruction fetch webpage, after the login account information receiving user's input, the immediate communication tool corresponding with login account information establishes a communications link, and send above-mentioned info web by this communication connection to this immediate communication tool, in the present embodiment, immediate communication tool allows user to run Multi-instance, and browser plug-in is after the login account information receiving user's input, the immediate communication tool corresponding with this login account information establishes a communications link, in info web to be collected can be made to be sent to exactly immediate communication tool that user specifies.Therefore relative to prior art, info web to be collected when allowing user to run Multi-instance, can be sent in given instance by the embodiment of the present invention exactly.
On basis embodiment illustrated in fig. 1, further describing browser plug-in is how to extract info web, specifically refers to Fig. 2, and another embodiment of webpage collection method comprises in embodiments of the present invention:
201, the collection instruction that user performs webpage is received;
When user is interested in the current webpage browsed, collection instruction can be performed to webpage, enter the Web page collection flow process to trigger browser plug-in.At the present embodiment, collection instruction can be the text in instruction collection webpage, also can be the whole webpage of instruction collection, or other part webpages of instruction collection.
202, the text label of webpage elements is extracted;
In the present embodiment, browser plug-in is by calling browser interface, and the web page element in traversal webpage, the text label carrying out web page element extracts.When collecting the text in instruction instruction collection webpage, browser plug-in extracts the text label of web page element in this text; When collecting the instruction instruction whole webpage of collection, browser plug-in extracts the text label of whole webpage elements, namely the webpage within the scope of the collection indicated in collection instruction is carried out to the text label extraction of web page element.
203, according to the label degree of depth of text label or the weighted value of text number determination text label;
In order to filter some noise informations in webpage, as advertisement etc., browser plug-in is chosen extracting the text label obtained.In the present embodiment, browser plug-in is using the label degree of depth of text label or text number as the basis for selecting of text label, according to the label degree of depth of text label or the weighted value of text number determination text label, such as, whether text number is greater than 20 bytes etc.
204, text label weighted value being greater than preset value is chosen for target text label;
In the present embodiment, after the weighted value obtaining each text label, the weighted value of browser plug-in to text label sorts, and text label weighted value being greater than preset value is chosen for target text label.
It should be noted that, in the present embodiment, step 203 and step 204 choose flow process, to filter some noise informations in webpage, as advertisement etc. for execution contexts label.Whether, in actual application, text label can also adopt basis for selecting to choose, specifically herein to the mode of specifically choosing of text label and perform and choose flow process and be not construed as limiting.
205, the label position of record object text label;
Choose flow process at execution contexts label, after obtaining target text label, the label position of browser plug-in record object text label, this label position is used to indicate the position of web page element corresponding to text label in webpage.
206, webpage elements and Style Attributes information is obtained according to label position;
In the present embodiment, browser plug-in can take out corresponding web page element and Style Attributes information according to label position.
207, combine the web page element that gets and Style Attributes information to generate and corresponding with webpage treat collection picture.
After getting web page element and Style Attributes information, browser plug-in carries out calculating superposition according to web page element and Style Attributes information, collects picture to generate corresponding with described webpage waiting.In the present embodiment, wait to collect the full content that picture comprises the webpage in collection instruction within the scope of the collection that indicates substantially, it can not only preserve the typesetting format of former webpage in good condition, avoid the loss of typesetting format, and, compared with web page interlinkage of the prior art, treat that collection picture comprises the content-data of webpage, even if original web page inaccessible, this treats that collection picture still can normal reading and management, and it does not receive the restriction of access time and access locations.
Above-mentioned steps 202 to 207 is for performing the extraction flow process of info web, it should be noted that, in the present embodiment, info web collects picture for waiting, this treats that collection picture comprises the whole web page contents preserving former webpage layout form substantially, in actual application, browser plug-in also can adopt other modes to extract web page contents, is specifically not construed as limiting the extracting mode of info web herein.
208, the login account information of user's input is received;
In the present embodiment, after triggering browser plug-in enters the Web page collection flow process, browser plug-in provides account log-in interface for user, so as user's input specify treat that collection is local.It should be noted that in the present embodiment, between login account information and immediate communication tool, there is corresponding relation, just can inquire the immediate communication tool corresponding with this login account information according to this login account information.In the present embodiment, this login account information can be the login account of immediate communication tool, and such as, when immediate communication tool is QQ, this login account information is No. QQ.
In the present embodiment, alternatively, before step 208, browser plug-in judges whether the computing machine of the current use of user is provided with immediate communication tool, and if so, then perform step 208, if not, then flow process terminates.
Further, browser plug-in can check the version of the immediate communication tool of current installation, and when this version is lower than default version, flow process terminates and points out upgrading, when this version is not less than default version, just performs step 208.
209, call ActiveX control, search in plan and whether there is the immediate communication tool corresponding with login account information, if so, then perform step 211, if not, then perform step 210;
In the present embodiment, browser plug-in will establish a communications link with immediate communication tool, must ensure that immediate communication tool is in logging status, therefore, browser plug-in calls ActiveX control, search in plan and whether there is the immediate communication tool corresponding with login account information, to judge whether the immediate communication tool needing to establish a communications link is in logging status.
210, the client of pull-up immediate communication tool, after user logs in the immediate communication tool corresponding with login account information, establishes a communications link with the immediate communication tool after logging in.
When the immediate communication tool corresponding with login account information cannot be found, show that this immediate communication tool corresponding with login account information is not in logging status, therefore, the client of browser plug-in pull-up immediate communication tool, after user logs in the immediate communication tool corresponding with login account information, establish a communications link with the immediate communication tool after logging in.
211, establish a communications link with the immediate communication tool found;
When finding the immediate communication tool corresponding with login account information, show that this immediate communication tool corresponding with login account information has been in logging status, therefore, browser plug-in directly establishes a communications link with the corresponding immediate communication tool found.
Based on this, browser plug-in is described below how to establish a communications link with the immediate communication tool being in logging status: immediate communication tool be in logging status time, browser plug-in opens port by calling interface notice immediate communication tool, and receive the respective end slogan that immediate communication tool returns, browser plug-in sends data and instruction, to set up communication connection between the two according to corresponding communication protocol to this port.
212, info web is sent by communication connection to immediate communication tool;
After setting up the communication connection between the browser plug-in immediate communication tool corresponding with described login account information, browser plug-in sends info web by this communication connection to above-mentioned immediate communication tool, to be stowed to by info web in immediate communication tool that user specifies.
In the present embodiment, alternatively, in order to reduce the volume waiting to collect picture, to improve the efficiency of data transmission, can also comprise before step 212: the Style Attributes information that browser plug-in is treated in collection picture carries out cleaning or compressing, wherein, cleaning process can be remove attribute that is useless or acquiescence, also can be merge attribute.Such as, when collection image content is Tengxun's homepage, its size may have 5M, and after cleaning, this treats that collection picture can taper to 700 ~ 800KB.
In the present embodiment, receive after collecting picture at the immediate communication tool corresponding with described login account information, this immediate communication tool can carry out performing initiatively collects logic based on this immediate communication tool, to wait that collecting picture is loaded in the editing machine of active collection of immediate communication tool, specifically comprises:
213, immediate communication tool will treat that collection picture is saved to local collection catalogue;
Namely receive that browser plug-in sent by communication connection after collection picture, immediate communication tool will be treated that collection picture is saved to and local collect catalogue.In the present embodiment, for dissimilar browser, by different flow processs, immediate communication tool can will treat that collection picture is saved to local collection catalogue respectively.Such as, when browser is the browser based on IE kernel, to compressed, immediate communication tool first can treat that collection picture unpacks, and call Windows application programming interfaces (WindowsAPI) by internal interface, copies figure to local collection catalogue at IE buffer memory; When browser is the browser based on webkit kernel, performs JS download take out after collection picture at browser, by what receive, immediate communication tool treats that collection picture is kept at local collection catalogue.
214, immediate communication tool will wait that the URL collecting picture is adjusted to local path;
Will after collection picture be saved to local collection catalogue at immediate communication tool, this is waited that the URL collecting picture is adjusted to local path by immediate communication tool.
215, according to local path, immediate communication tool will treat that collection picture generates html file;
Namely the html file based on local path is obtained.
216, the editing machine of immediate communication tool initiatively collection to this locality loads html file.
Obtain above-mentioned based on the html file of local path after, immediate communication tool this html file can be loaded into immediate communication tool active collection editing machine in, and complete wait collect picture initiatively collect logic.
In the technical scheme that the embodiment of the present invention provides, after receiving the collection instruction that user performs webpage, browser plug-in is according to the info web of this collection instruction fetch webpage, after the login account information receiving user's input, the immediate communication tool corresponding with login account information establishes a communications link, and send above-mentioned info web by this communication connection to this immediate communication tool, in the present embodiment, immediate communication tool allows user to run Multi-instance, and browser plug-in is after the login account information receiving user's input, the immediate communication tool corresponding with this login account information establishes a communications link, in info web to be collected can be made to be sent to exactly immediate communication tool that user specifies.Therefore relative to prior art, info web to be collected when allowing user to run Multi-instance, can be sent in given instance by the embodiment of the present invention exactly.
Further, in the technical scheme that the embodiment of the present invention provides, extract the info web obtained and substantially comprise the whole web page contents preserving former webpage layout form, compared with prior art, it can not only preserve the typesetting format of former webpage in good condition, avoids the loss of typesetting format, and comprise the content-data of webpage, even if original web page inaccessible, this info web still can normal reading and management, and it does not receive the restriction of access time and access locations.And in the technical scheme that the embodiment of the present invention provides, immediate communication tool is receiving after collection picture, execution is initiatively being collected logic and this is being waited collecting picture is loaded in the editing machine of active collection of immediate communication tool.
For ease of understanding, with a concrete application scenarios, the webpage collection method described in above-described embodiment is described in detail below, concrete:
In the present embodiment, treat that collection webpage is Tengxun's homepage (as shown in Figure 3A), immediate communication tool is QQ;
When user performs collection instruction to Tengxun homepage, the collection instruction of browser plug-in response user, to enter the Web page collection flow process to trigger browser plug-in.
In above-mentioned Tengxun homepage, web page element comprises word, picture and video etc., after the text label extracting above-mentioned web page element, can according to the weighted value of preset rules determination text label, to choose target text label, such as, Jingdone district advertisement in the webpage lower right corner can be filtered.
After obtaining webpage elements and Style Attributes information according to the label position of target text label, can obtain combining the web page element that gets and Style Attributes information and generate and corresponding with webpage treat collection picture.
Browser plug-in provides account log-in interface for user, user after No. QQ, this account log-in interface input, if this QQ is in Entered state, then browser plug-in directly establishes a communications link to this QQ, if this QQ is not in logging status, then browser plug-in pull-up QQ client, after user logs in this QQ, establishes a communications link with this QQ after logging in.
After browser plug-in and this QQ establish a communications link, browser plug-in sends info web by this communication connection to this QQ.By above-mentioned, this QQ treats that collection picture is saved to local collection catalogue, and perform and initiatively collect logic based on this QQ, will wait to collect in the editing machine that active that picture is loaded into this QQ collects, and obtain the collection page as shown in Figure 3 B.
Be described the webpage collection method in the embodiment of the present invention above, be described below, refer to Fig. 4 to the browser plug-in in the embodiment of the present invention, in the embodiment of the present invention, browser plug-in embodiment comprises:
First receiving element 401, for receiving the collection instruction that user performs webpage;
Extraction unit 402, for the info web of webpage according to described collection instruction fetch;
Second receiving element 403, for receiving the login account information of described user input;
Set up unit 404, establish a communications link for the immediate communication tool corresponding with described login account information;
Transmitting element 405, for sending described info web to described immediate communication tool.
For ease of understanding, below for an embody rule scene, browser plug-in inner working flow process in the present embodiment is described:
First receiving element 401 receives the collection instruction that user performs webpage; The info web of extraction unit 402 webpage according to described collection instruction fetch; Second receiving element 403 receives the login account information of described user input; The immediate communication tool setting up unit 404 corresponding with described login account information establishes a communications link; Transmitting element 405 sends described info web to described immediate communication tool.
In the technical scheme that the embodiment of the present invention provides, after receiving the collection instruction that user performs webpage, browser plug-in is according to the info web of this collection instruction fetch webpage, after the login account information receiving user's input, the immediate communication tool corresponding with login account information establishes a communications link, and send above-mentioned info web by this communication connection to this immediate communication tool, in the present embodiment, immediate communication tool allows user to run Multi-instance, and browser plug-in is after the login account information receiving user's input, the immediate communication tool corresponding with this login account information establishes a communications link, in info web to be collected can be made to be sent to exactly immediate communication tool that user specifies.Therefore relative to prior art, info web to be collected when allowing user to run Multi-instance, can be sent in given instance by the embodiment of the present invention exactly.
On basis embodiment illustrated in fig. 4, the embodiment of the present invention further describes the concrete outcome of the browser plug-in that can extract the whole web page contents preserving former webpage layout form, specifically refers to Fig. 5: in the embodiment of the present invention, another embodiment of browser plug-in comprises:
First receiving element 501, for receiving the collection instruction that user performs webpage;
Extraction unit 502, for the info web of webpage according to described collection instruction fetch;
Second receiving element 503, for receiving the login account information of described user input;
Set up unit 504, establish a communications link for the immediate communication tool corresponding with described login account information;
Transmitting element 505, for sending described info web to described immediate communication tool.
Further, in this enforcement, described extraction unit 502 comprises:
Extraction module 5021, for extracting the text label of described webpage elements;
Logging modle 5022, for recording the label position of described text label;
Acquisition module 5023, for obtaining described webpage elements and Style Attributes information according to described label position;
Generation module 5024, generates corresponding with described webpage waiting collect picture for combining the described web page element that gets and described Style Attributes information.
Alternatively, in the present embodiment, described extraction unit 502 also comprises:
Determination module 5025, for described record the label position of described text label before, determine the weighted value of described text label according to the label degree of depth of described text label or text number;
Choose module 5026, be chosen for target text label for text label weighted value being greater than preset value;
Described logging modle 5022, specifically for recording the label position of described target text label.
Alternatively, in the present embodiment, described extraction unit 502 also comprises:
Cleaning module 5027, for described send described info web by described communication connection to described immediate communication tool before, wait that the described Style Attributes information of collecting in picture is cleaned to described.
Alternatively, in the present embodiment, described unit 504 of setting up comprises:
Search module, for calling ActiveX control, searching in plan and whether there is the immediate communication tool corresponding with described login account information;
First sets up module, for establishing a communications link with the described immediate communication tool found;
Second sets up module, for the client of immediate communication tool described in pull-up, after user logs in the immediate communication tool corresponding with described login account information, establishes a communications link with the described immediate communication tool after logging in.
In the technical scheme that the embodiment of the present invention provides, after receiving the collection instruction that user performs webpage, browser plug-in is according to the info web of this collection instruction fetch webpage, after the login account information receiving user's input, the immediate communication tool corresponding with login account information establishes a communications link, and send above-mentioned info web by this communication connection to this immediate communication tool, in the present embodiment, immediate communication tool allows user to run Multi-instance, and browser plug-in is after the login account information receiving user's input, the immediate communication tool corresponding with this login account information establishes a communications link, in info web to be collected can be made to be sent to exactly immediate communication tool that user specifies.Therefore relative to prior art, info web to be collected when allowing user to run Multi-instance, can be sent in given instance by the embodiment of the present invention exactly.
Further, in the technical scheme that the embodiment of the present invention provides, browser plug-in extracts the info web obtained and substantially comprises the whole web page contents preserving former webpage layout form, compared with prior art, it can not only preserve the typesetting format of former webpage in good condition, avoids the loss of typesetting format, and comprise the content-data of webpage, even if original web page inaccessible, this info web still can normal reading and management, and it does not receive the restriction of access time and access locations.
Those skilled in the art can be well understood to, and for convenience and simplicity of description, the system of foregoing description, the specific works process of device and unit, with reference to the corresponding process in preceding method embodiment, can not repeat them here.
From the angle of modular functionality entity, the browser plug-in the embodiment of the present invention is described above, from the angle of hardware handles, the browser plug-in the embodiment of the present invention is described below, refer to Fig. 6, in the embodiment of the present invention, another embodiment of browser plug-in comprises:
Input media 601, output unit 602, processor 603 and storer 604 (wherein the quantity of the processor 603 of server can be one or more, for a processor 601 in Fig. 6).In some embodiments of the invention, input media 601, output unit 602, processor 603 are connected by bus or alternate manner with storer 604, wherein, to be connected by bus in Fig. 6.
Wherein, by calling the operational order that storer 604 stores, processor 603, for performing following steps:
Receive the collection instruction that user performs webpage;
The info web of webpage according to described collection instruction fetch;
Receive the login account information of described user input;
The immediate communication tool corresponding with described login account information establishes a communications link;
Described info web is sent to described immediate communication tool.
In some embodiments of the invention, processor 603 is specifically for performing following steps:
Extract the text label of described webpage elements;
Record the label position of described text label;
Described webpage elements and Style Attributes information is obtained according to described label position;
Generate corresponding with described webpage waiting in conjunction with the described web page element got and described Style Attributes information and collect picture.
In some embodiments of the invention, processor 603 is also for performing following steps:
Described record the label position of described text label before, determine the weighted value of described text label according to the label degree of depth of described text label or text number;
Text label weighted value being greater than preset value is chosen for target text label;
Processor 603 is specifically for performing following steps:
Record the label position of described target text label.
In some embodiments of the invention, processor 603 is also for performing following steps:
Described send described info web to described immediate communication tool before, wait that the described Style Attributes information of collecting in picture is cleaned to described.
In some embodiments of the invention, processor 603 is specifically for performing following steps:
Call ActiveX control, search in plan and whether there is the immediate communication tool corresponding with described login account information,
If so, then establish a communications link with the described immediate communication tool found;
If not, then the client of immediate communication tool described in pull-up, after user logs in the immediate communication tool corresponding with described login account information, establishes a communications link with the described immediate communication tool after logging in.
In the technical scheme that the embodiment of the present invention provides, after receiving the collection instruction that user performs webpage, browser plug-in is according to the info web of this collection instruction fetch webpage, after the login account information receiving user's input, the immediate communication tool corresponding with login account information establishes a communications link, and send above-mentioned info web by this communication connection to this immediate communication tool, in the present embodiment, immediate communication tool allows user to run Multi-instance, and browser plug-in is after the login account information receiving user's input, the immediate communication tool corresponding with this login account information establishes a communications link, in info web to be collected can be made to be sent to exactly immediate communication tool that user specifies.Therefore relative to prior art, info web to be collected when allowing user to run Multi-instance, can be sent in given instance by the embodiment of the present invention exactly.
In several embodiments that the application provides, should be understood that, disclosed system, apparatus and method, can realize by another way.Such as, device embodiment described above is only schematic, such as, the division of described unit, be only a kind of logic function to divide, actual can have other dividing mode when realizing, such as multiple unit or assembly can in conjunction with or another system can be integrated into, or some features can be ignored, or do not perform.Another point, shown or discussed coupling each other or direct-coupling or communication connection can be by some interfaces, and the indirect coupling of device or unit or communication connection can be electrical, machinery or other form.
The described unit illustrated as separating component or can may not be and physically separates, and the parts as unit display can be or may not be physical location, namely can be positioned at a place, or also can be distributed in multiple network element.Some or all of unit wherein can be selected according to the actual needs to realize the object of the present embodiment scheme.
In addition, each functional unit in each embodiment of the present invention can be integrated in a processing unit, also can be that the independent physics of unit exists, also can two or more unit in a unit integrated.Above-mentioned integrated unit both can adopt the form of hardware to realize, and the form of SFU software functional unit also can be adopted to realize.
If described integrated unit using the form of SFU software functional unit realize and as independently production marketing or use time, can be stored in a computer read/write memory medium.Based on such understanding, the part that technical scheme of the present invention contributes to prior art in essence in other words or all or part of of this technical scheme can embody with the form of software product, this computer software product is stored in a storage medium, comprising some instructions in order to make a computer equipment (can be personal computer, server, or the network equipment etc.) perform all or part of step of method described in each embodiment of the present invention.And aforesaid storage medium comprises: USB flash disk, portable hard drive, ROM (read-only memory) (ROM, Read-OnlyMemory), random access memory (RAM, RandomAccessMemory), magnetic disc or CD etc. various can be program code stored medium.
The above, above embodiment only in order to technical scheme of the present invention to be described, is not intended to limit; Although with reference to previous embodiment to invention has been detailed description, those of ordinary skill in the art is to be understood that: it still can be modified to the technical scheme described in foregoing embodiments, or carries out equivalent replacement to wherein portion of techniques feature; And these amendments or replacement, do not make the essence of appropriate technical solution depart from the spirit and scope of various embodiments of the present invention technical scheme.

Claims (11)

1. a webpage collection method, is characterized in that, comprising:
Receive the collection instruction that user performs webpage;
The info web of webpage according to described collection instruction fetch;
Receive the login account information of described user input;
The immediate communication tool corresponding with described login account information establishes a communications link;
Described info web is sent to described immediate communication tool.
2. webpage collection method as claimed in claim 1, it is characterized in that, the info web of described webpage according to described collection instruction fetch comprises:
Extract the text label of described webpage elements;
Record the label position of described text label;
Described webpage elements and Style Attributes information is obtained according to described label position;
Generate corresponding with described webpage waiting in conjunction with the described web page element got and described Style Attributes information and collect picture.
3. webpage collection method as claimed in claim 2, is characterized in that, described record the step of the label position of described text label before comprise:
The weighted value of described text label is determined according to the label degree of depth of described text label or text number;
Text label weighted value being greater than preset value is chosen for target text label;
The described label position recording described text label specifically comprises:
Record the label position of described target text label.
4. webpage collection method as claimed in claim 2, is characterized in that, described send the step of described info web to described immediate communication tool before also comprise:
Wait that the described Style Attributes information of collecting in picture is cleaned to described.
5. webpage collection method as claimed in claim 1, is characterized in that, immediate communication tool corresponding to described and described login account information establishes a communications link and comprise:
Call ActiveX control, search in plan and whether there is the immediate communication tool corresponding with described login account information,
If so, then establish a communications link with the described immediate communication tool found;
If not, then the client of immediate communication tool described in pull-up, after user logs in the immediate communication tool corresponding with described login account information, establishes a communications link with the described immediate communication tool after logging in.
6. as the webpage collection method in claim 2 to 5 as described in any one, it is characterized in that, also comprise after described immediate communication tool sends described info web described:
By described, described immediate communication tool treats that collection picture is saved to local collection catalogue;
By described, described immediate communication tool waits that the uniform resource position mark URL of collecting picture is adjusted to local path;
By described, described immediate communication tool treats that collection picture generates html file according to described local path;
The editing machine of described immediate communication tool initiatively collection to this locality loads described html file.
7. a browser plug-in, is characterized in that, comprising:
First receiving element, for receiving the collection instruction that user performs webpage;
Extraction unit, for the info web of webpage according to described collection instruction fetch;
Second receiving element, for receiving the login account information of described user input;
Set up unit, establish a communications link for the immediate communication tool corresponding with described login account information;
Transmitting element, for sending described info web to described immediate communication tool.
8. browser plug-in as claimed in claim 7, it is characterized in that, described extraction unit comprises:
Extraction module, for extracting the text label of described webpage elements;
Logging modle, for recording the label position of described text label;
Acquisition module, for obtaining described webpage elements and Style Attributes information according to described label position;
Generation module, generates corresponding with described webpage waiting collect picture for combining the described web page element that gets and described Style Attributes information.
9. browser plug-in as claimed in claim 8, it is characterized in that, described extraction unit also comprises:
Determination module, for described record the label position of described text label before, determine the weighted value of described text label according to the label degree of depth of described text label or text number;
Choose module, be chosen for target text label for text label weighted value being greater than preset value;
Described logging modle, specifically for recording the label position of described target text label.
10. browser plug-in as claimed in claim 8, it is characterized in that, described extraction unit also comprises:
Cleaning module, for described send described info web to described immediate communication tool before, wait that the described Style Attributes information of collecting in picture is cleaned to described.
11. browser plug-ins as claimed in claim 7, it is characterized in that, described unit of setting up comprises:
Search module, for calling ActiveX control, searching in plan and whether there is the immediate communication tool corresponding with described login account information;
First sets up module, for establishing a communications link with the described immediate communication tool found;
Second sets up module, for the client of immediate communication tool described in pull-up, after user logs in the immediate communication tool corresponding with described login account information, establishes a communications link with the described immediate communication tool after logging in.
CN201410594451.2A 2014-10-29 2014-10-29 Webpage collection method and browser plug-in Active CN105550179B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201410594451.2A CN105550179B (en) 2014-10-29 2014-10-29 Webpage collection method and browser plug-in

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201410594451.2A CN105550179B (en) 2014-10-29 2014-10-29 Webpage collection method and browser plug-in

Publications (2)

Publication Number Publication Date
CN105550179A true CN105550179A (en) 2016-05-04
CN105550179B CN105550179B (en) 2020-07-24

Family

ID=55829368

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201410594451.2A Active CN105550179B (en) 2014-10-29 2014-10-29 Webpage collection method and browser plug-in

Country Status (1)

Country Link
CN (1) CN105550179B (en)

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105975156A (en) * 2016-05-09 2016-09-28 北京小米移动软件有限公司 Application interface display method and device
CN107229705A (en) * 2017-05-25 2017-10-03 北京小米移动软件有限公司 Information resources lookup method, device and computer-readable recording medium
CN110020335A (en) * 2017-07-28 2019-07-16 北京搜狗科技发展有限公司 The treating method and apparatus of collection
CN110837397A (en) * 2019-09-27 2020-02-25 云深互联(北京)科技有限公司 Method, device and equipment for configuring browser plug-in
CN111104619A (en) * 2018-10-25 2020-05-05 青岛海信移动通信技术股份有限公司 Method for collecting articles and mobile terminal
CN114117269A (en) * 2022-01-26 2022-03-01 荣耀终端有限公司 Memorandum information collection method and device, electronic equipment and storage medium
CN107229705B (en) * 2017-05-25 2024-05-31 北京小米移动软件有限公司 Information resource searching method, device and computer readable storage medium

Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1719822A (en) * 2004-07-07 2006-01-11 腾讯科技(深圳)有限公司 Method for adding web page storage of instant communication tool
US20070233715A1 (en) * 2006-03-30 2007-10-04 Sony Corporation Resource management system, method and program for selecting candidate tag
CN102270206A (en) * 2010-06-03 2011-12-07 北京迅捷英翔网络科技有限公司 Method and device for capturing valid web page contents
CN102508897A (en) * 2011-11-03 2012-06-20 匡晓明 General information collection method and system
CN102646135A (en) * 2012-03-31 2012-08-22 奇智软件(北京)有限公司 Webpage collecting method, device and system
CN103001856A (en) * 2012-12-05 2013-03-27 华为软件技术有限公司 Information sharing method and system and instant messaging (IM) client and server
CN103179164A (en) * 2011-12-23 2013-06-26 宇龙计算机通信科技(深圳)有限公司 Method and communication terminal of storing page information
CN103559288A (en) * 2013-11-08 2014-02-05 惠州Tcl移动通信有限公司 Method and mobile terminal for intelligent collecting and sharing
CN103617224A (en) * 2012-03-31 2014-03-05 北京奇虎科技有限公司 Webpage collecting method, webpage collecting device and webpage collecting system
CN103678555A (en) * 2013-12-06 2014-03-26 北京奇虎科技有限公司 Webpage collecting method and browser

Patent Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1719822A (en) * 2004-07-07 2006-01-11 腾讯科技(深圳)有限公司 Method for adding web page storage of instant communication tool
US20070233715A1 (en) * 2006-03-30 2007-10-04 Sony Corporation Resource management system, method and program for selecting candidate tag
CN102270206A (en) * 2010-06-03 2011-12-07 北京迅捷英翔网络科技有限公司 Method and device for capturing valid web page contents
CN102508897A (en) * 2011-11-03 2012-06-20 匡晓明 General information collection method and system
CN103179164A (en) * 2011-12-23 2013-06-26 宇龙计算机通信科技(深圳)有限公司 Method and communication terminal of storing page information
CN102646135A (en) * 2012-03-31 2012-08-22 奇智软件(北京)有限公司 Webpage collecting method, device and system
CN103617224A (en) * 2012-03-31 2014-03-05 北京奇虎科技有限公司 Webpage collecting method, webpage collecting device and webpage collecting system
CN103001856A (en) * 2012-12-05 2013-03-27 华为软件技术有限公司 Information sharing method and system and instant messaging (IM) client and server
CN103559288A (en) * 2013-11-08 2014-02-05 惠州Tcl移动通信有限公司 Method and mobile terminal for intelligent collecting and sharing
CN103678555A (en) * 2013-12-06 2014-03-26 北京奇虎科技有限公司 Webpage collecting method and browser

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
李明: "《大学文科计算机基础实验指导》", 31 July 2005 *

Cited By (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10599288B2 (en) 2016-05-09 2020-03-24 Beijing Xiaomi Mobile Software Co., Ltd. Method and device for displaying an application interface
WO2017193526A1 (en) * 2016-05-09 2017-11-16 北京小米移动软件有限公司 Application interface display method and apparatus
CN105975156B (en) * 2016-05-09 2019-06-07 北京小米移动软件有限公司 Application interface display methods and device
CN105975156A (en) * 2016-05-09 2016-09-28 北京小米移动软件有限公司 Application interface display method and device
US11416112B2 (en) 2016-05-09 2022-08-16 Beijing Xiaomi Mobile Software Co., Ltd. Method and device for displaying an application interface
CN107229705A (en) * 2017-05-25 2017-10-03 北京小米移动软件有限公司 Information resources lookup method, device and computer-readable recording medium
CN107229705B (en) * 2017-05-25 2024-05-31 北京小米移动软件有限公司 Information resource searching method, device and computer readable storage medium
CN110020335A (en) * 2017-07-28 2019-07-16 北京搜狗科技发展有限公司 The treating method and apparatus of collection
CN110020335B (en) * 2017-07-28 2022-04-26 北京搜狗科技发展有限公司 Favorite processing method and device
CN111104619A (en) * 2018-10-25 2020-05-05 青岛海信移动通信技术股份有限公司 Method for collecting articles and mobile terminal
CN111104619B (en) * 2018-10-25 2023-09-26 青岛海信移动通信技术有限公司 Method for collecting articles and mobile terminal
CN110837397A (en) * 2019-09-27 2020-02-25 云深互联(北京)科技有限公司 Method, device and equipment for configuring browser plug-in
CN114117269A (en) * 2022-01-26 2022-03-01 荣耀终端有限公司 Memorandum information collection method and device, electronic equipment and storage medium

Also Published As

Publication number Publication date
CN105550179B (en) 2020-07-24

Similar Documents

Publication Publication Date Title
CN110245069B (en) Page version testing method and device and page display method and device
CN109684575A (en) Processing method and processing device, storage medium, the computer equipment of web data
CN105550179A (en) Webpage collection method and browser plug-in
JP6514244B2 (en) Difference detection device and program
CN110798445B (en) Public gateway interface testing method and device, computer equipment and storage medium
CN104253741A (en) Information sending method and device and system thereof
CN113382083B (en) Webpage screenshot method and device
CN107894945A (en) Bury an adding method, mobile terminal and computer-readable recording medium
US20200204688A1 (en) Picture book sharing method and apparatus and system using the same
CN107370628B (en) Log processing method and system based on embedded points
CN104765823A (en) Method and device for collecting website data
CN116992081A (en) Page form data processing method and device and user terminal
CN104361007B (en) The processing method of browser and its collection
CN105187562A (en) System and method for operating remote file
CN109697281A (en) The online method, apparatus and electronic equipment for merging document
CN104899203A (en) Webpage generating method, webpage generating device and terminal equipment
CN104615597A (en) Method, device and system for clearing cache file in browser
CN112307386A (en) Information monitoring method, system, electronic device and computer readable storage medium
JP2019101889A (en) Test execution device and program
CN104580298A (en) File uploading processing method and device
US20100007919A1 (en) Document management apparatus, document management method, and document management program
CN104572981A (en) Web page caching method and mobile terminal device
CN115065945B (en) Short message link generation method and device, electronic equipment and storage medium
CN108268488A (en) The recognition methods of webpage master map and device
CN108984221B (en) Method and device for acquiring multi-platform user behavior logs

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant