CN105550179B - Webpage collection method and browser plug-in - Google Patents

Webpage collection method and browser plug-in Download PDF

Info

Publication number
CN105550179B
CN105550179B CN201410594451.2A CN201410594451A CN105550179B CN 105550179 B CN105550179 B CN 105550179B CN 201410594451 A CN201410594451 A CN 201410594451A CN 105550179 B CN105550179 B CN 105550179B
Authority
CN
China
Prior art keywords
webpage
information
text
label
instant
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201410594451.2A
Other languages
Chinese (zh)
Other versions
CN105550179A (en
Inventor
梁宇
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tencent Technology Shenzhen Co Ltd
Original Assignee
Tencent Technology Shenzhen Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tencent Technology Shenzhen Co Ltd filed Critical Tencent Technology Shenzhen Co Ltd
Priority to CN201410594451.2A priority Critical patent/CN105550179B/en
Publication of CN105550179A publication Critical patent/CN105550179A/en
Application granted granted Critical
Publication of CN105550179B publication Critical patent/CN105550179B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Landscapes

  • Information Transfer Between Computers (AREA)

Abstract

The embodiment of the invention discloses a webpage collecting method and a browser plug-in, which can accurately send webpage information to be collected to a specified instance under the condition that a user is allowed to operate a plurality of instances. The method provided by the embodiment of the invention comprises the following steps: receiving a collection instruction executed by a user on a webpage; extracting webpage information of the webpage according to the collection instruction; receiving login account information input by the user; establishing communication connection with an instant communication tool corresponding to the login account information; and sending the webpage information to the instant messaging tool.

Description

Webpage collection method and browser plug-in
Technical Field
The invention relates to the technical field of computers, in particular to a webpage collection method and a browser plug-in.
Background
The browser can display the HTM L file contents of the web server or the file system, so that the user can conveniently interact with the files, wherein the web page favorite is a basic application of the browser and is used for storing web page links which the user needs to collect in a local computer terminal, so that the user can directly open the corresponding web page by clicking a web page title loaded in the web page favorite, and the user can conveniently view the interested web page contents at any time.
At present, in order to solve the problem of losing web page links stored in a local computer terminal, a network favorite is proposed for storing the web page links into a network database, such as impression notes, cloud notes and the like, and then, the network favorite tools all adopt a single-instance mode, only allow a single instance to run, and favorite data are all sent to the single instance by default.
On the basis that the single-instance collection solution is simple, the inventor of the application provides a collection solution which can allow a user to run a plurality of instances and can accurately send collection data to a specified instance.
Disclosure of Invention
The embodiment of the invention provides a webpage collecting method, which can accurately send webpage information to be collected to a specified instance under the condition that a user is allowed to operate a plurality of instances.
A first aspect of an embodiment of the present invention provides a method for collecting a webpage, including:
receiving a collection instruction executed by a user on a webpage;
extracting webpage information of the webpage according to the collection instruction;
receiving login account information input by the user;
establishing communication connection with an instant communication tool corresponding to the login account information;
and sending the webpage information to the instant messaging tool.
A second aspect of an embodiment of the present invention provides a browser plug-in, including:
the first receiving unit is used for receiving a collection instruction executed by a user on a webpage;
the extraction unit is used for extracting the webpage information of the webpage according to the collection instruction;
the second receiving unit is used for receiving login account information input by the user;
the establishing unit is used for establishing communication connection with the instant communication tool corresponding to the login account information;
and the sending unit is used for sending the webpage information to the instant messaging tool.
According to the technical scheme provided by the embodiment of the invention, after a collection instruction executed by a user on a webpage is received, a browser plug-in extracts webpage information of the webpage according to the collection instruction, after login account information input by the user is received, communication connection is established with an instant communication tool corresponding to the login account information, and the webpage information is sent to the instant communication tool through the communication connection. Compared with the prior art, the embodiment of the invention can accurately send the webpage information to be collected to the specified instance under the condition of allowing the user to run a plurality of instances.
Drawings
FIG. 1 is a schematic diagram of an embodiment of a web page collection method according to an embodiment of the present invention;
FIG. 2 is a diagram of another embodiment of a web page collection method according to the embodiment of the present invention;
fig. 3A and fig. 3B are schematic diagrams of application scenarios of the web page collection method in the embodiment of the present invention;
FIG. 4 is a diagram of an embodiment of a browser plug-in according to an embodiment of the present invention;
FIG. 5 is a diagram of another embodiment of a browser plug-in according to an embodiment of the present invention;
FIG. 6 is a diagram of another embodiment of a browser plug-in according to an embodiment of the present invention;
Detailed Description
The embodiment of the invention provides a webpage collecting method and a browser plug-in, which can accurately send webpage information to be collected to a specified instance under the condition that a user is allowed to operate a plurality of instances. The following are detailed below.
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
The examples refer to database programs capable of supporting database operation, and are concrete representations of the objects.
The web page collection method provided by the embodiment of the present invention is described below, and it should be noted that the web page collection method provided by the embodiment of the present invention may be applied to various browser tools, for example, a browser based on an IE kernel or a browser based on a webkit kernel, which are not specifically listed here.
Referring to fig. 1, the method for collecting a webpage in the embodiment of the present invention includes:
101. receiving a collection instruction executed by a user on a webpage;
when the user is interested in the currently browsed webpage, the collection instruction can be executed on the webpage so as to trigger the browser plug-in to enter a webpage collection process.
102. Extracting webpage information of a webpage according to the collection instruction;
and when the collection instruction is received, the browser plug-in extracts the webpage information of the webpage according to the collection instruction. It should be noted that, in this embodiment, the extraction range of the web page may be a part of the web page in the web page, such as text, or may be the whole web page, and the user may indicate the collection range of the web page through the collection instruction without limitation here.
103. Receiving login account information input by a user;
in this embodiment, after the browser plug-in is triggered to enter the web page collection process, the browser plug-in provides an account login interface for the user, so that the user can input a designated place to be collected.
104. Establishing communication connection with an instant communication tool corresponding to the login account information;
in this embodiment, there is a corresponding relationship between the login account information and the instant messaging tool, and the instant messaging tool corresponding to the login account information can be queried according to the login account information. After the browser plug-in and the instant messaging tool corresponding to the login account information establish communication connection, the browser plug-in can send data and instructions to the instant messaging tool according to the corresponding communication protocol.
It should be noted that, the step 102 may also be executed after the step 103 and the step 104, or a part of the flow in the step 102 is executed after the step 103 and the step 104, which is not specifically limited herein, and in the actual application process, there is no timing restriction between the extraction flow of the web page information and the establishment flow of the communication connection.
105. Sending webpage information to an instant messaging tool;
after the communication connection is established, the browser plug-in sends the webpage information to the instant messaging tool through the communication connection so as to send the extracted webpage information to the specified instant messaging tool.
According to the technical scheme provided by the embodiment of the invention, after a collection instruction executed by a user on a webpage is received, a browser plug-in extracts webpage information of the webpage according to the collection instruction, after login account information input by the user is received, communication connection is established with an instant communication tool corresponding to the login account information, and the webpage information is sent to the instant communication tool through the communication connection. Compared with the prior art, the embodiment of the invention can accurately send the webpage information to be collected to the specified instance under the condition of allowing the user to run a plurality of instances.
Based on the embodiment shown in fig. 1, how a browser plug-in extracts web page information is further described, specifically referring to fig. 2, another embodiment of the web page collection method in the embodiment of the present invention includes:
201. receiving a collection instruction executed by a user on a webpage;
when the user is interested in the currently browsed webpage, the collection instruction can be executed on the webpage so as to trigger the browser plug-in to enter a webpage collection process. In this embodiment, the collection instruction may indicate to collect text in the web page, may indicate to collect the entire web page, or indicate to collect other web pages.
202. Extracting text labels of webpage elements in a webpage;
in this embodiment, the browser plug-in traverses the web page elements in the web page by calling a browser interface, and performs text tag extraction on the web page elements. When the collection instruction indicates that the text in the webpage is collected, the browser plug-in extracts a text label of a webpage element in the text; when the collection instruction indicates that the whole webpage is collected, the browser plug-in extracts the text labels of the webpage elements in the whole webpage, namely the text labels of the webpage elements of the webpage in the collection range indicated in the collection instruction are extracted.
203. Determining the weight value of the text label according to the label depth or the text number of the text label;
in order to filter out some noise information in the webpage, such as advertisements and the like, the browser plug-in selects the extracted text labels. In this embodiment, the browser plug-in takes the tag depth or the text number of the text tag as a selection basis of the text tag, and determines the weight value of the text tag according to the tag depth or the text number of the text tag, for example, whether the text number is greater than 20 bytes or not.
204. Selecting the text label with the weight value larger than the preset value as a target text label;
in this embodiment, after the weight values of the text labels are obtained, the browser plug-in sorts the weight values of the text labels, and selects the text label with the weight value larger than the preset value as the target text label.
It should be noted that, in this embodiment, step 203 and step 204 are used to perform a text label selection process to filter out some noise information in the web page, such as advertisements. In the actual application process, the text label can also be selected by adopting a selection basis, and the specific selection mode of the text label and whether to execute the selection process are not limited at the specific position.
205. Recording the label position of a target text label;
after the selection process of the text label is executed to obtain the target text label, the browser plug-in records the label position of the target text label, wherein the label position is used for indicating the position of the webpage element corresponding to the text label in the webpage.
206. Acquiring webpage elements and style attribute information in a webpage according to the label position;
in this embodiment, the browser plug-in may extract the corresponding web page element and the style attribute information according to the tag position.
207. And generating a picture to be stored corresponding to the webpage by combining the acquired webpage elements and the style attribute information.
After the webpage elements and the style attribute information are acquired, the browser plug-in performs calculation and superposition according to the webpage elements and the style attribute information to generate the picture to be collected corresponding to the webpage. In this embodiment, the picture to be stored basically includes all the contents of the web pages within the collection range indicated in the collection instruction, which not only can perfectly store the typesetting format of the original web page and avoid the loss of the typesetting format, but also, compared with the web page link in the prior art, the picture to be stored includes the content data of the web page, even if the original web page is inaccessible, the picture to be stored can still be read and managed normally, and the limitation of the access time and the access place is not received.
It should be noted that, in this embodiment, the web page information is a to-be-stored picture, the to-be-stored picture basically includes all the web page contents stored with the original web page layout format, and in an actual application process, the browser plug-in may also extract the web page contents in other manners, and the specific manner of extracting the web page information is not limited herein.
208. Receiving login account information input by a user;
in this embodiment, after the browser plug-in is triggered to enter the web page collection process, the browser plug-in provides an account login interface for the user, so that the user can input a designated place to be collected. It should be noted that, in this embodiment, there is a corresponding relationship between the login account information and the instant messaging tool, and the instant messaging tool corresponding to the login account information can be queried according to the login account information. In this embodiment, the login account information may be a login account of the instant messenger, for example, when the instant messenger is a QQ, the login account information is a QQ number.
In this embodiment, optionally, before step 208, the browser plug-in determines whether the computer currently used by the user is installed with the instant messaging tool, if so, step 208 is executed, and if not, the process is ended.
Further, the browser plug-in may check the version of the currently installed instant messaging tool, and when the version is lower than the preset version, the process is ended and upgrade is prompted, and when the version is not lower than the preset version, step 208 is executed.
209. Calling an ActiveX control, searching whether an instant messaging tool corresponding to login account information exists in a process table, if so, executing step 211, and if not, executing step 210;
in this embodiment, the browser plug-in needs to establish a communication connection with the instant messaging tool and must ensure that the instant messaging tool is in a logged-in state, so the browser plug-in invokes an ActiveX control to search in the process table whether an instant messaging tool corresponding to the login account information exists or not, so as to determine whether the instant messaging tool needing to establish a communication connection is in a logged-in state or not.
210. And pulling up the client of the instant messaging tool, and after the user logs in the instant messaging tool corresponding to the login account information, establishing communication connection with the logged-in instant messaging tool.
When the instant messaging tool corresponding to the login account information cannot be found, the instant messaging tool corresponding to the login account information is not in a logged-in state, so that the browser plug-in pulls up a client of the instant messaging tool, and after a user logs in the instant messaging tool corresponding to the login account information, communication connection is established with the logged-in instant messaging tool.
211. Establishing communication connection with the searched instant communication tool;
when the instant messaging tool corresponding to the login account information is found, the instant messaging tool corresponding to the login account information is shown to be in a logged-in state, and therefore the browser plug-in is directly in communication connection with the found instant messaging tool.
Based on this, how the browser plug-in establishes a communication connection with the instant messenger in the logged-in state is described below: when the instant messaging tool is in a logged-in state, the browser plug-in informs the instant messaging tool to open a port through a calling interface, receives a corresponding port number returned by the instant messaging tool, and sends data and instructions to the port according to a corresponding communication protocol to establish communication connection between the browser plug-in and the port.
212. Sending webpage information to an instant messaging tool through communication connection;
after the communication connection between the browser plug-in and the instant messaging tool corresponding to the login account information is established, the browser plug-in sends the webpage information to the instant messaging tool through the communication connection so as to collect the webpage information into the instant messaging tool appointed by the user.
In this embodiment, optionally, in order to reduce the volume of the picture to be hidden so as to improve the efficiency of data transmission, before step 212, the method may further include: the browser plug-in cleans or compresses the style attribute information in the pictures to be collected, wherein the cleaning process can remove useless or default attributes or can combine the attributes. For example, when the content of the picture to be stored is the tench home page, the size of the picture to be stored may be 5M, and after the picture to be stored is cleaned, the size of the picture to be stored may be reduced to 700-800 KB.
In this embodiment, after the instant messaging tool corresponding to the login account information receives the to-be-stored picture, the instant messaging tool may execute an active collection logic based on the instant messaging tool, and load the to-be-stored picture into an active collection editor of the instant messaging tool, where the active collection editor specifically includes:
213. the instant communication tool saves the pictures to be collected to a local collection catalog;
after receiving the pictures to be collected sent by the browser plug-in through communication connection, the instant messaging tool stores the pictures to be collected to the local collection catalog. In this embodiment, for different types of browsers, the instant messaging tool may store the to-be-stored pictures to the local collection directory through different processes. For example, when the browser is based on the IE kernel, the instant messaging tool may first unpack the compressed picture to be collected, call a Windows operating system application program interface (Windows API) through an internal interface, and cache the copy picture in the IE to the local collection directory; when the browser is based on a webkit kernel, after the browser executes JS downloading to take out the picture to be collected, the instant messaging tool stores the received picture to be collected in a local collection directory.
214. The instant messaging tool adjusts the UR L of the picture to be stored into a local path;
after the instant messenger saves the picture to be stored to the local storage directory, the instant messenger adjusts UR L of the picture to be stored to a local path.
215. The instant messaging tool generates an HTM L file from the picture to be stored according to the local path;
i.e. get a local path based HTM L file.
216. The instant messenger loads the HTM L file into the locally active favorite editor.
After obtaining the HTM L file based on the local path, the instant messaging tool may load the HTM L file into an active collection editor of the instant messaging tool, and complete an active collection logic of the picture to be collected.
According to the technical scheme provided by the embodiment of the invention, after a collection instruction executed by a user on a webpage is received, a browser plug-in extracts webpage information of the webpage according to the collection instruction, after login account information input by the user is received, communication connection is established with an instant communication tool corresponding to the login account information, and the webpage information is sent to the instant communication tool through the communication connection. Compared with the prior art, the embodiment of the invention can accurately send the webpage information to be collected to the specified instance under the condition of allowing the user to run a plurality of instances.
Further, in the technical solution provided in the embodiment of the present invention, the extracted web page information basically includes all the web page contents in which the original web page layout format is stored, and compared with the prior art, the extracted web page information not only can perfectly store the layout format of the original web page and avoid the loss of the layout format, but also includes the content data of the web page, and even if the original web page is inaccessible, the web page information can still be normally read and managed without the limitations of access time and access place. In addition, in the technical solution provided by the embodiment of the present invention, after receiving the picture to be collected, the instant messaging tool executes the active collection logic to load the picture to be collected into the active collection editor of the instant messaging tool.
For convenience of understanding, the following describes the web page collection method described in the above embodiment in a specific application scenario, specifically:
in this embodiment, the web page to be stored is the Tencent home page (as shown in FIG. 3A), and the instant messenger is QQ;
when the user executes a collection instruction on the Tencent homepage, the browser plug-in responds to the collection instruction of the user to trigger the browser plug-in to enter a webpage collection flow.
In the Tencent homepage, the webpage elements include characters, pictures, videos and the like, and after the text labels of the webpage elements are extracted, the weight values of the text labels can be determined according to preset rules so as to select target text labels, for example, the Jingdong advertisements at the lower right corner of the webpage can be filtered.
After webpage element and style attribute information in the webpage are acquired according to the label position of the target text label, the to-be-hidden picture corresponding to the webpage can be generated by combining the acquired webpage element and style attribute information.
The browser plug-in provides an account login interface for a user, after the user inputs a QQ number on the account login interface, if the QQ is already in a login state, the browser plug-in directly establishes communication connection with the QQ, if the QQ is not already in the login state, the browser plug-in pulls up a QQ client, and after the user logs in the QQ, the browser plug-in establishes communication connection with the logged-in QQ.
After the browser plug-in establishes communication connection with the QQ, the browser plug-in sends webpage information to the QQ through the communication connection. The QQ stores the picture to be collected to a local collection directory, and executes an active collection logic based on the QQ, and loads the picture to be collected to an editor of the active collection of the QQ, so as to obtain a collection page as shown in fig. 3B.
The above description of the method for collecting a webpage in the embodiment of the present invention, and the following description of the browser plug-in the embodiment of the present invention, please refer to fig. 4, where an embodiment of the browser plug-in the embodiment of the present invention includes:
a first receiving unit 401, configured to receive a collection instruction executed by a user on a webpage;
an extracting unit 402, configured to extract webpage information of the webpage according to the collection instruction;
a second receiving unit 403, configured to receive login account information input by the user;
an establishing unit 404, configured to establish a communication connection with an instant messaging tool corresponding to the login account information;
a sending unit 405, configured to send the web page information to the instant messaging tool.
For convenience of understanding, the following describes an internal operation flow of the browser plug-in this embodiment by taking a specific application scenario as an example:
the first receiving unit 401 receives a collection instruction executed by a user on a webpage; the extracting unit 402 extracts the web page information of the web page according to the collection instruction; the second receiving unit 403 receives login account information input by the user; the establishing unit 404 establishes communication connection with the instant messaging tool corresponding to the login account information; the sending unit 405 sends the web page information to the instant messenger.
According to the technical scheme provided by the embodiment of the invention, after a collection instruction executed by a user on a webpage is received, a browser plug-in extracts webpage information of the webpage according to the collection instruction, after login account information input by the user is received, communication connection is established with an instant communication tool corresponding to the login account information, and the webpage information is sent to the instant communication tool through the communication connection. Compared with the prior art, the embodiment of the invention can accurately send the webpage information to be collected to the specified instance under the condition of allowing the user to run a plurality of instances.
Based on the embodiment shown in fig. 4, the embodiment of the present invention further describes a specific result of the browser plug-in that can extract all the web page contents stored in the original web page layout format, and refer to fig. 5 specifically: another embodiment of the browser plug-in the embodiment of the present invention includes:
a first receiving unit 501, configured to receive a collection instruction executed by a user on a webpage;
an extracting unit 502, configured to extract webpage information of the webpage according to the collection instruction;
a second receiving unit 503, configured to receive login account information input by the user;
an establishing unit 504, configured to establish a communication connection with an instant messaging tool corresponding to the login account information;
a sending unit 505, configured to send the web page information to the instant messaging tool.
Further, in this implementation, the extraction unit 502 includes:
the extraction module 5021 is used for extracting text labels of webpage elements in the webpage;
a recording module 5022, configured to record a tag position of the text tag;
an obtaining module 5023, configured to obtain the webpage elements and the style attribute information in the webpage according to the tag position;
the generating module 5024 is configured to generate a to-be-stored picture corresponding to the webpage by combining the acquired webpage elements and the style attribute information.
Optionally, in this embodiment, the extracting unit 502 further includes:
a determining module 5025, configured to determine, before the tag position of the text tag is recorded, a weight value of the text tag according to a tag depth or a text number of the text tag;
a selecting module 5026, configured to select a text label with a weight value greater than a preset value as a target text label;
the recording module 5022 is specifically configured to record the tag position of the target text tag.
Optionally, in this embodiment, the extracting unit 502 further includes:
a cleaning module 5027, configured to clean the style attribute information in the picture to be hidden before sending the web page information to the instant messenger through the communication connection.
Optionally, in this embodiment, the establishing unit 504 includes:
the searching module is used for calling the ActiveX control and searching whether an instant communication tool corresponding to the login account information exists in a process table;
the first establishing module is used for establishing communication connection with the searched instant communication tool;
and the second establishing module is used for pulling up the client of the instant communication tool and establishing communication connection with the logged instant communication tool after the user logs in the instant communication tool corresponding to the login account information.
According to the technical scheme provided by the embodiment of the invention, after a collection instruction executed by a user on a webpage is received, a browser plug-in extracts webpage information of the webpage according to the collection instruction, after login account information input by the user is received, communication connection is established with an instant communication tool corresponding to the login account information, and the webpage information is sent to the instant communication tool through the communication connection. Compared with the prior art, the embodiment of the invention can accurately send the webpage information to be collected to the specified instance under the condition of allowing the user to run a plurality of instances.
Further, in the technical solution provided in the embodiment of the present invention, the web page information extracted by the browser plug-in basically includes all the web page contents in which the original web page layout format is stored, and compared with the prior art, the web page information not only can perfectly store the layout format of the original web page and avoid the loss of the layout format, but also includes the content data of the web page, and even if the original web page is inaccessible, the web page information can still be read and managed normally without receiving the restrictions of the access time and the access place.
It is clear to those skilled in the art that, for convenience and brevity of description, the specific working processes of the above-described systems, apparatuses and units may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.
In the above, the browser plug-in the embodiment of the present invention is described from the perspective of the modular functional entity, and in the following, the browser plug-in the embodiment of the present invention is described from the perspective of hardware processing, referring to fig. 6, another embodiment of the browser plug-in the embodiment of the present invention includes:
an input device 601, an output device 602, a processor 603 and a memory 604 (wherein the number of the processors 603 of the server may be one or more, and one processor 601 is taken as an example in fig. 6). In some embodiments of the present invention, the input device 601, the output device 602, the processor 603 and the memory 604 may be connected by a bus or other means, wherein the connection by the bus is exemplified in fig. 6.
Wherein, by calling the operation instruction stored in the memory 604, the processor 603 is configured to perform the following steps:
receiving a collection instruction executed by a user on a webpage;
extracting webpage information of the webpage according to the collection instruction;
receiving login account information input by the user;
establishing communication connection with an instant communication tool corresponding to the login account information;
and sending the webpage information to the instant messaging tool.
In some embodiments of the present invention, the processor 603 is specifically configured to perform the following steps:
extracting text labels of webpage elements in the webpage;
recording the label position of the text label;
acquiring webpage elements and style attribute information in the webpage according to the label position;
and combining the acquired webpage elements and the style attribute information to generate a picture to be collected corresponding to the webpage.
In some embodiments of the invention, the processor 603 is further configured to perform the following steps:
before the label position of the text label is recorded, determining a weight value of the text label according to the label depth or the number of texts of the text label;
selecting the text label with the weight value larger than the preset value as a target text label;
the processor 603 is specifically configured to perform the following steps:
and recording the label position of the target text label.
In some embodiments of the invention, the processor 603 is further configured to perform the following steps:
and cleaning the style attribute information in the picture to be collected before sending the webpage information to the instant messaging tool.
In some embodiments of the present invention, the processor 603 is specifically configured to perform the following steps:
calling ActiveX control, searching whether an instant communication tool corresponding to the login account information exists in a process table,
if so, establishing communication connection with the searched instant communication tool;
if not, pulling up the client of the instant communication tool, and after the user logs in the instant communication tool corresponding to the login account information, establishing communication connection with the logged-in instant communication tool.
According to the technical scheme provided by the embodiment of the invention, after a collection instruction executed by a user on a webpage is received, a browser plug-in extracts webpage information of the webpage according to the collection instruction, after login account information input by the user is received, communication connection is established with an instant communication tool corresponding to the login account information, and the webpage information is sent to the instant communication tool through the communication connection. Compared with the prior art, the embodiment of the invention can accurately send the webpage information to be collected to the specified instance under the condition of allowing the user to run a plurality of instances.
In the several embodiments provided in the present application, it should be understood that the disclosed system, apparatus and method may be implemented in other manners. For example, the above-described apparatus embodiments are merely illustrative, and for example, the division of the units is only one logical division, and other divisions may be realized in practice, for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or units, and may be in an electrical, mechanical or other form.
The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.
In addition, functional units in the embodiments of the present invention may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit.
The integrated unit, if implemented in the form of a software functional unit and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.
The above-mentioned embodiments are only used for illustrating the technical solutions of the present invention, and not for limiting the same; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present invention.

Claims (11)

1. A method for collecting web pages is characterized by comprising the following steps:
receiving a collection instruction executed by a user on a webpage;
extracting webpage information of the webpage according to the collection instruction;
receiving login account information input by the user;
establishing communication connection with an instant communication tool corresponding to the login account information;
sending the webpage information to a designated instance in a plurality of instances allowing a user to operate in the instant messaging tool so as to collect the webpage information to the instant messaging tool, wherein the webpage information comprises a picture to be collected, and the picture to be collected comprises content data of the webpage;
wherein, the extracting the webpage information of the webpage according to the collection instruction comprises:
extracting text labels of webpage elements in the webpage;
recording the label position of the text label, wherein the label position is used for indicating the position of a webpage element corresponding to the text label in a webpage;
acquiring webpage elements and style attribute information in the webpage according to the label position;
and calculating and superposing the acquired webpage elements and the style attribute information to generate the picture to be collected corresponding to the webpage.
2. The method of web page collection of claim 1, wherein prior to the step of recording the tag location of the text tag, comprising:
determining a weight value of the text label according to the label depth or the text number of the text label;
selecting the text label with the weight value larger than the preset value as a target text label;
the recording of the tag position of the text tag specifically includes:
and recording the label position of the target text label.
3. The method for webpage collection of claim 1, wherein before the step of sending the webpage information to the instant messenger, further comprising:
and cleaning the style attribute information in the picture to be collected.
4. The method for collecting web pages as claimed in claim 1, wherein the establishing communication connection with the instant messenger corresponding to the login account information includes:
calling ActiveX control, searching whether an instant communication tool corresponding to the login account information exists in a process table,
if so, establishing communication connection with the searched instant communication tool;
if not, pulling up the client of the instant communication tool, and after the user logs in the instant communication tool corresponding to the login account information, establishing communication connection with the logged-in instant communication tool.
5. The method for collecting webpage information according to any one of claims 1 to 4, further comprising, after the sending the webpage information to the instant messenger:
the instant communication tool saves the picture to be collected to a local collection catalog;
the instant messaging tool adjusts the uniform resource locator UR L of the picture to be collected into a local path;
the instant messaging tool generates an HTM L file from the picture to be stored according to the local path;
the instant messenger loads the HTM L file into the locally active favorite editor.
6. A browser plug-in, comprising:
the first receiving unit is used for receiving a collection instruction executed by a user on a webpage;
the extraction unit is used for extracting the webpage information of the webpage according to the collection instruction;
the second receiving unit is used for receiving login account information input by the user;
the establishing unit is used for establishing communication connection with the instant communication tool corresponding to the login account information;
the sending unit is used for sending the webpage information to a specified instance in a plurality of instances allowing a user to operate in the instant messaging tool so as to collect the webpage information to the instant messaging tool, wherein the webpage information comprises a picture to be collected, and the picture to be collected comprises content data of a webpage;
wherein the extraction unit includes:
the extraction module is used for extracting the text labels of the webpage elements in the webpage;
the recording module is used for recording the label position of the text label, and the label position is used for indicating the position of a webpage element corresponding to the text label in a webpage;
the acquisition module is used for acquiring webpage elements and style attribute information in the webpage according to the label position;
and the generating module is used for calculating and superposing the acquired webpage elements and the style attribute information to generate the pictures to be collected corresponding to the webpage.
7. The browser plug-in of claim 6, wherein the extraction unit further comprises:
the determining module is used for determining the weight value of the text label according to the label depth or the text number of the text label before the label position of the text label is recorded;
the selecting module is used for selecting the text labels with the weight values larger than the preset value as target text labels;
the recording module is specifically configured to record a tag position of the target text tag.
8. The browser plug-in of claim 6, wherein the extraction unit further comprises:
and the cleaning module is used for cleaning the style attribute information in the picture to be collected before the webpage information is sent to the instant messaging tool.
9. The browser plug-in of claim 6, wherein the establishing unit comprises:
the searching module is used for calling the ActiveX control and searching whether an instant communication tool corresponding to the login account information exists in a process table;
the first establishing module is used for establishing communication connection with the searched instant communication tool;
and the second establishing module is used for pulling up the client of the instant communication tool and establishing communication connection with the logged instant communication tool after the user logs in the instant communication tool corresponding to the login account information.
10. A browser plug-in, comprising: a processor and a memory;
the memory is used for providing instructions and data to the processor;
the processor is configured to perform the method of any one of claims 1-5.
11. A computer-readable storage medium comprising instructions for performing the method of any of claims 1-5.
CN201410594451.2A 2014-10-29 2014-10-29 Webpage collection method and browser plug-in Active CN105550179B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201410594451.2A CN105550179B (en) 2014-10-29 2014-10-29 Webpage collection method and browser plug-in

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201410594451.2A CN105550179B (en) 2014-10-29 2014-10-29 Webpage collection method and browser plug-in

Publications (2)

Publication Number Publication Date
CN105550179A CN105550179A (en) 2016-05-04
CN105550179B true CN105550179B (en) 2020-07-24

Family

ID=55829368

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201410594451.2A Active CN105550179B (en) 2014-10-29 2014-10-29 Webpage collection method and browser plug-in

Country Status (1)

Country Link
CN (1) CN105550179B (en)

Families Citing this family (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110231901B (en) * 2016-05-09 2021-01-29 北京小米移动软件有限公司 Application interface display method and device
CN110020335B (en) * 2017-07-28 2022-04-26 北京搜狗科技发展有限公司 Favorite processing method and device
CN111104619B (en) * 2018-10-25 2023-09-26 青岛海信移动通信技术有限公司 Method for collecting articles and mobile terminal
CN110837397A (en) * 2019-09-27 2020-02-25 云深互联(北京)科技有限公司 Method, device and equipment for configuring browser plug-in
CN114117269B (en) * 2022-01-26 2022-09-20 荣耀终端有限公司 Memo information collection method and device, electronic equipment and storage medium

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102646135A (en) * 2012-03-31 2012-08-22 奇智软件(北京)有限公司 Webpage collecting method, device and system
CN103001856A (en) * 2012-12-05 2013-03-27 华为软件技术有限公司 Information sharing method and system and instant messaging (IM) client and server
CN103179164A (en) * 2011-12-23 2013-06-26 宇龙计算机通信科技(深圳)有限公司 Method and communication terminal of storing page information
CN103559288A (en) * 2013-11-08 2014-02-05 惠州Tcl移动通信有限公司 Method and mobile terminal for intelligent collecting and sharing
CN103678555A (en) * 2013-12-06 2014-03-26 北京奇虎科技有限公司 Webpage collecting method and browser

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN100426809C (en) * 2004-07-07 2008-10-15 腾讯科技(深圳)有限公司 Method for adding web page storage of instant communication tool
JP2007272390A (en) * 2006-03-30 2007-10-18 Sony Corp Resource management device, tag candidate selection method and tag candidate selection program
CN102270206A (en) * 2010-06-03 2011-12-07 北京迅捷英翔网络科技有限公司 Method and device for capturing valid web page contents
CN102508897B (en) * 2011-11-03 2013-08-21 匡晓明 General information collection method and system
CN103617224B (en) * 2012-03-31 2018-01-19 北京奇虎科技有限公司 A kind of webpage collection method, apparatus and system

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103179164A (en) * 2011-12-23 2013-06-26 宇龙计算机通信科技(深圳)有限公司 Method and communication terminal of storing page information
CN102646135A (en) * 2012-03-31 2012-08-22 奇智软件(北京)有限公司 Webpage collecting method, device and system
CN103001856A (en) * 2012-12-05 2013-03-27 华为软件技术有限公司 Information sharing method and system and instant messaging (IM) client and server
CN103559288A (en) * 2013-11-08 2014-02-05 惠州Tcl移动通信有限公司 Method and mobile terminal for intelligent collecting and sharing
CN103678555A (en) * 2013-12-06 2014-03-26 北京奇虎科技有限公司 Webpage collecting method and browser

Also Published As

Publication number Publication date
CN105550179A (en) 2016-05-04

Similar Documents

Publication Publication Date Title
CN108847977B (en) Service data monitoring method, storage medium and server
CN105550179B (en) Webpage collection method and browser plug-in
CN102646135B (en) Webpage collecting method, device and system
CN107729475B (en) Webpage element acquisition method, device, terminal and computer-readable storage medium
US8413044B2 (en) Method and system of retrieving Ajax web page content
CN103186670B (en) A kind of method and system of complete collection info web
CN104765746B (en) Data processing method and device for mobile communication terminal browser
CN104243273A (en) Method and device for displaying information on instant messaging client and information display system
CN105095280A (en) Caching method and apparatus for browser
CN109829121B (en) Method and device for reporting click behavior data
TW201800962A (en) Webpage file sending method, webpage rendering method and device and webpage rendering system
CN107273369B (en) Table data modification method and device
CN106874271A (en) A kind of method and system that PC webpages are converted to mobile terminal webpage
CN110851756A (en) Page loading method and device, computer readable storage medium and terminal equipment
CN102968345A (en) Note real-time synchronizing method and device
CN107526755B (en) Data processing method and device
CN111177623A (en) Information processing method and device
CN109213824B (en) Data capture system, method and device
US11729248B2 (en) Web application component migration to a cloud computing system
CN104899212A (en) Webpage display method, server and system
CN109862074B (en) Data acquisition method and device, readable medium and electronic equipment
CN113742551A (en) Dynamic data capture method based on script and puppeteer
CN102968347A (en) Method for synchronizing browser memo in real time and browser realizing memo real-time synchronization
CN108108381B (en) Page monitoring method and device
CN111273964B (en) Data loading method and device

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant