CN106557495B - Crawler function expansion method and device - Google Patents

Crawler function expansion method and device Download PDF

Info

Publication number
CN106557495B
CN106557495B CN201510625057.5A CN201510625057A CN106557495B CN 106557495 B CN106557495 B CN 106557495B CN 201510625057 A CN201510625057 A CN 201510625057A CN 106557495 B CN106557495 B CN 106557495B
Authority
CN
China
Prior art keywords
crawler
function
plug
function extension
extension plug
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201510625057.5A
Other languages
Chinese (zh)
Other versions
CN106557495A (en
Inventor
崔志伸
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Gridsum Technology Co Ltd
Original Assignee
Beijing Gridsum Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Gridsum Technology Co Ltd filed Critical Beijing Gridsum Technology Co Ltd
Priority to CN201510625057.5A priority Critical patent/CN106557495B/en
Publication of CN106557495A publication Critical patent/CN106557495A/en
Application granted granted Critical
Publication of CN106557495B publication Critical patent/CN106557495B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/951Indexing; Web crawling techniques

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Stored Programmes (AREA)
  • Information Transfer Between Computers (AREA)

Abstract

The application discloses a crawler function expansion method and device. Wherein, the method comprises the following steps: loading a function extension plug-in under a specified directory when a crawler starts a crawling task; identifying the starting condition of the loaded function extension plug-in; and when the enabling condition is met, calling the function extension plug-in to execute the function of the function extension plug-in. The crawler function expanding method and the crawler function expanding device solve the technical problem that the crawler function is complex.

Description

Crawler function expansion method and device
Technical Field
The application relates to the field of crawlers, in particular to a crawler function expansion method and device.
Background
In the crawling process of the web crawler, special processing is often required for some web pages. These special processes are typically only effective for certain types of crawl tasks and do not affect the execution of other tasks. As the demand increases, corresponding customization functionality is always added. In the traditional process of function extension, the crawler is redesigned every time a new function is added, the new function is integrated into the whole crawler, the new crawler is retested, and all existing crawlers are replaced during deployment, so that the problem of complexity in the process of extending the functions of the crawler is caused.
In view of the above problems, no effective solution has been proposed.
Disclosure of Invention
The embodiment of the application provides a crawler function expansion method and device, and aims to at least solve the technical problem that the crawler function is complex to expand.
According to an aspect of an embodiment of the present application, there is provided a method for expanding a crawler function, including: loading a function extension plug-in under a specified directory when a crawler starts a crawling task; identifying the starting condition of the loaded function extension plug-in; and when the enabling condition is met, calling the function extension plug-in to execute the function of the function extension plug-in.
According to another aspect of the embodiments of the present application, there is also provided an expansion apparatus of a crawler function, including: the crawler crawling task management system comprises a loading unit, a crawling task execution unit and a crawling task execution unit, wherein the loading unit is used for loading a function expansion plug-in under a specified directory when a crawler starts a crawling task; the identification unit is used for identifying the starting condition of the loaded function extension plug-in; and the calling unit is used for calling the function extension plug-in to execute the function of the function extension plug-in when the starting condition is met.
In the embodiment of the application, a function extension plug-in under a specified directory is loaded when a crawler starts a crawling task; identifying the starting condition of the loaded function extension plug-in; when the starting condition is met, the function extension plug-in is called to execute the function of the function extension plug-in, the function to be extended is stored in the specified directory in the form of the function extension plug-in, the crawler loads the function extension plug-in under the specified directory when starting the crawling task, and the function extension plug-in is called when the starting condition of the function extension plug-in is met, so that the extension and automatic calling of the crawler function are realized. Because the logic of the crawler is not changed, the expansion of the crawler function can be realized by changing the function expansion plug-in under the appointed directory, and the logic of the crawler can not be changed when the function expansion plug-in under the appointed directory is changed, the function expansion can be realized by adding an interface for detecting the preset directory in the existing crawler, the redesign of the crawler is avoided, the problems of development cost increase, retest of all programs and redeployment caused by the redesign of the crawler are avoided, the process for expanding the crawler function is simplified, the technical problem of complexity in the process of expanding the crawler function is solved, and the technical effect of simplifying the function of the expanded crawler is achieved.
Drawings
The accompanying drawings, which are included to provide a further understanding of the application and are incorporated in and constitute a part of this application, illustrate embodiment(s) of the application and together with the description serve to explain the application and not to limit the application. In the drawings:
FIG. 1 is a flow chart of a method of expanding crawler functionality according to an embodiment of the present application;
fig. 2 is a schematic diagram of an expansion device for a crawler function according to an embodiment of the present application.
Detailed Description
In order to make the technical solutions better understood by those skilled in the art, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only partial embodiments of the present application, but not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.
It should be noted that the terms "first," "second," and the like in the description and claims of this application and in the drawings described above are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used is interchangeable under appropriate circumstances such that the embodiments of the application described herein are capable of operation in sequences other than those illustrated or described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.
In accordance with an embodiment of the present application, there is provided a method embodiment of a method for expanding crawler functionality, it is noted that the steps illustrated in the flowchart of the figure may be performed in a computer system such as a set of computer executable instructions, and that while a logical order is illustrated in the flowchart, in some cases the steps illustrated or described may be performed in an order different than here.
Fig. 1 is a flowchart of a method for expanding a crawler function according to an embodiment of the present application, as shown in fig. 1, the method including the steps of:
and S102, loading the function extension plug-in under the appointed directory when the crawler starts a crawling task. And when the crawler starts a crawling task, the crawler firstly detects the function extension plug-in to be loaded from the specified directory so as to realize the function extension of the crawler. Each function extension plug-in can realize an extension function, and a designated directory can include one or more function extension plug-ins, for example, a certain function extension plug-in realizes crawling of a designated plate of a designated webpage, a certain function extension plug-in realizes analysis of crawled contents, and a certain function extension plug-in realizes skip function waiting. The crawler loads one or more function extension plug-ins under the specified directory.
Step S104, identifying the starting condition of the loaded function extension plug-in. The functions implemented by each function extension plug-in are different, and the running time of different functions is also different. For example, the function of crawling specified panels is performed before crawling a web page, and the function of parsing the crawled web page is performed after crawling the web page. The function expansion plug-in carries the enabling condition thereof, so that the function expansion plug-in runs under the corresponding enabling condition.
And step S106, calling the function extension plug-in to execute the function of the function extension plug-in when the starting condition is met.
According to the embodiment, the functions to be expanded are stored in the designated directory in the form of the function expansion plug-ins, the crawler loads the function expansion plug-ins in the designated directory when starting the crawling task, and the function expansion plug-ins are called under the condition that the starting conditions of the function expansion plug-ins are met, so that the expansion and automatic calling of the crawler functions are realized. Because the logic of the crawler is not changed, the expansion of the crawler function can be realized by changing the function expansion plug-in under the appointed directory, and the logic of the crawler can not be changed when the function expansion plug-in under the appointed directory is changed, the function expansion can be realized by adding an interface for detecting the preset directory in the existing crawler, the redesign of the crawler is avoided, the problems of development cost increase, retest of all programs and redeployment caused by the redesign of the crawler are avoided, the process for expanding the crawler function is simplified, the technical problem of complexity in the process of expanding the crawler function is solved, and the technical effect of simplifying the function of the expanded crawler is achieved.
Optionally, loading a function extension plug-in under a specified directory when the crawler starts the crawling task includes: when a crawler starts a crawling task, searching for a function extension plug-in meeting a preset interface rule and a preset name under a specified directory; and loading the searched function expansion plug-in to the crawler. The preset interface rule can be a communication rule between the crawler and the function extension plug-ins, when a plurality of function extension plug-ins are stored in the specified directory, the crawler searches the function extension plug-ins which can communicate with the crawler and meet the name requirement, and loads the searched function extension plug-ins. The path of the specified directory and the name of the function extension plug-in to be loaded can be stored in advance in the crawler, so that the crawler can conveniently search the corresponding function extension module from the specified directory.
The function expansion plug-ins meeting one crawler cluster can be stored in one designated directory, and the function expansion plug-ins of a plurality of crawler clusters can also be stored. When a designated directory stores function extension plug-ins of a crawler cluster, a crawler cluster can adopt a uniform interface rule, and then the function extension plug-ins stored in the directory can adopt the same interface rule. When the designated directory stores the function extension plug-ins of a plurality of crawler clusters and each crawler cluster adopts different interface rules, the function extension plug-ins in the same directory can adopt the same or different interface rules. When a plurality of crawler clusters need to perform function extension, the plurality of crawler clusters can search the function extension plug-in units from different specified catalogs. When a plurality of crawler clusters or a plurality of crawlers share one designated directory, or one directory is used respectively, each crawler can perform function expansion by using the steps.
Optionally, loading the found function extension plug-in to the crawler includes: extracting the position information carried by the searched function expansion plug-in; loading the searched function expansion plug-in to a position corresponding to the position information in the crawler according to the position information; identifying the enabling conditions of the loaded function extension plug-in includes: judging whether the crawler executes to the position of the searched function expansion plug-in; when the enabling condition is met, calling the function extension plug-in to execute the function of the function extension plug-in comprises the following steps: and when judging that the crawler executes to the position of the searched function extension plug-in, executing the function extension plug-in at the position.
The starting condition may be position information of the function extension plug-in, that is, the crawler executes the position where the found function extension plug-in is located, when the crawler loads the function extension plug-in, the crawler loads the function extension plug-in to a corresponding position of the crawler according to the position information carried by the function extension plug-in, and when the crawler executes the task to the position where the function extension plug-in is located, the crawler executes the function extension plug-in to realize the extension of the crawler function.
Before the webpage crawling is executed, a crawling condition is expanded through a plug-in 1, namely that only the sports section of the A website is crawled; after the webpage crawling is executed, jumping of the crawling website is performed through the plug-in 2. The crawler can run the plug-ins one by one from beginning to end, namely, the function expansion plug-ins at the corresponding positions are run when the plug-ins are run to the corresponding positions.
Optionally, the found function extension plug-ins are a plurality of function extension plug-ins, and calling the function extension plug-ins to execute the functions of the function extension plug-ins comprises identifying priority information of the plurality of function extension plug-ins; and calling the corresponding function expansion plug-ins according to the information in the order from high priority to low priority. The function expansion plug-in carries a priority mark, obtains the priority mark of the corresponding function expansion plug-in, the priority mark can be represented by a number, and a plurality of function expansion plug-ins at the same position are sequentially executed according to the sequence of the priority from high to low. And if a certain function expansion plug-in does not have the priority identification, executing the function expansion plug-in as the last function expansion plug-in.
The order of the function expansion plug-ins can be controlled by adopting the priority identification, so that the crawler function can be customized. The function expansion is realized by changing the function expansion plug-in under the appointed directory, and the expanded function can be adjusted by setting the priority of the function expansion plug-in, so that the customization and adjustment of the crawler function are realized, and the flexibility of the crawler function expansion is improved.
Optionally, loading the found function extension plug-in to the crawler includes: comparing whether the searched function expansion plug-ins are consistent with the function expansion plug-ins in the crawler; when inconsistent, updating the function extension plug-in the crawler, wherein: when the first function extension plug-in of the searched function extension plug-ins does not exist in the crawler, loading the first function extension plug-in to the crawler; and when the comparison shows that the second function extension plug-in the crawler does not exist in the specified directory, deleting the second function extension plug-in.
When the crawler starts a crawling task, the function extension plug-ins under the appointed directory are loaded, when the function extension plug-ins are loaded in the crawler, whether the function extension plug-ins in the crawler are consistent with the function extension plug-ins under the appointed directory or not is compared, and when the function extension plug-ins are not consistent, the function extension plug-ins loaded in the crawler are adjusted according to the content of the difference. Optionally, when a first function extension plug-in the searched function extension plug-ins is not loaded to the crawler, loading the first function extension plug-in to the crawler; and when the second function extension plug-in loaded in the crawler is not the searched function extension plug-in, deleting the second function extension plug-in loaded in the crawler, thereby realizing the automatic update of the function extension plug-in the crawler. When the functions of the crawler need to be expanded, only the function expansion plug-ins under the specified directory are modified, one or more function expansion plug-ins can be modified simultaneously, one or more function expansion plug-ins can be loaded simultaneously, expansion of one or more functions is achieved, the crawler does not need to be modified when the functions are expanded, and the technical problem that the crawler is complex to expand is solved.
The above-described embodiment has the following advantages:
1. the development cost is reduced, and the corresponding module is developed when the function is expanded every time, so that the crawler is not required to be modified.
2. The maintenance cost is reduced, each module can be loaded, called and operated independently, and the whole stream of the crawler is avoided
The process has an effect.
3. The deployment cost is reduced, and each deployment only needs to copy the newly added functional module to the specified position.
According to an embodiment of the present application, an embodiment of an expansion device for a crawler function is provided. As shown in fig. 2, the crawler function extension apparatus includes: a loading unit 10, an identification unit 20 and a calling unit 30.
The loading unit 10 is used for loading the function extension plug-in under the specified directory when the crawler starts the crawling task. And when the crawler starts a crawling task, the crawler firstly detects the function extension plug-in to be loaded from the specified directory so as to realize the function extension of the crawler. Each function extension plug-in can realize an extension function, and a designated directory can include one or more function extension plug-ins, for example, a certain function extension plug-in realizes crawling of a designated plate of a designated webpage, a certain function extension plug-in realizes analysis of crawled contents, and a certain function extension plug-in realizes skip function waiting. The crawler loads one or more function extension plug-ins under the specified directory.
The identification unit 20 is used to identify the enabling conditions of the loaded function extension plug-in. The functions implemented by each function extension plug-in are different, and the running time of different functions is also different. For example, the function of crawling specified panels is performed before crawling a web page, and the function of parsing the crawled web page is performed after crawling the web page. The function expansion plug-in carries the enabling condition thereof, so that the function expansion plug-in runs under the corresponding enabling condition.
The calling unit 30 is used for calling the function extension plug-in to execute the function of the function extension plug-in when the enabling condition is satisfied.
According to the embodiment, the functions to be expanded are stored in the designated directory in the form of the function expansion plug-ins, the crawler loads the function expansion plug-ins in the designated directory when starting the crawling task, and the function expansion plug-ins are called under the condition that the starting conditions of the function expansion plug-ins are met, so that the expansion and automatic calling of the crawler functions are realized. Because the logic of the crawler is not changed, the expansion of the crawler function can be realized by changing the function expansion plug-in under the appointed directory, and the logic of the crawler can not be changed when the function expansion plug-in under the appointed directory is changed, the function expansion can be realized by adding an interface for detecting the preset directory in the existing crawler, the redesign of the crawler is avoided, the problems of development cost increase, retest of all programs and redeployment caused by the redesign of the crawler are avoided, the process for expanding the crawler function is simplified, the technical problem of complexity in the process of expanding the crawler function is solved, and the technical effect of simplifying the function of the expanded crawler is achieved.
Optionally, the loading unit comprises: the searching module is used for searching the function extension plug-ins meeting the preset interface rules and the preset names under the specified directory when the crawler starts the crawling task; and the loading module is used for loading the searched function expansion plug-in to the crawler.
The preset interface rule can be a communication rule between the crawler and the function extension plug-ins, when a plurality of function extension plug-ins are stored in the specified directory, the crawler searches the function extension plug-ins which can communicate with the crawler and meet the name requirement, and loads the searched function extension plug-ins. The path of the specified directory and the name of the function extension plug-in to be loaded can be stored in advance in the crawler, so that the crawler can conveniently search the corresponding function extension module from the specified directory.
The function expansion plug-ins meeting one crawler cluster can be stored in one designated directory, and the function expansion plug-ins of a plurality of crawler clusters can also be stored. When a designated directory stores function extension plug-ins of a crawler cluster, a crawler cluster can adopt a uniform interface rule, and then the function extension plug-ins stored in the directory can adopt the same interface rule. When the designated directory stores the function extension plug-ins of a plurality of crawler clusters and each crawler cluster adopts different interface rules, the function extension plug-ins in the same directory can adopt the same or different interface rules. When a plurality of crawler clusters need to perform function extension, the plurality of crawler clusters can search the function extension plug-in units from different specified catalogs. When a plurality of crawler clusters or a plurality of crawlers share one designated directory, or one directory is used respectively, each crawler can perform function expansion by using the steps.
Optionally, the lookup module includes: the extraction submodule is used for extracting the position information carried by the searched function expansion plug-in; the loading submodule is used for loading the searched function expansion plug-in to a position corresponding to the position information in the crawler according to the position information; the identification unit is also used for judging whether the crawler executes the position of the searched function expansion plug-in; the calling unit is also used for executing the function extension plug-in at the position when the crawler is judged to be executed to the position where the searched function extension plug-in is located.
The starting condition may be position information of the function extension plug-in, that is, the crawler executes the position where the found function extension plug-in is located, when the crawler loads the function extension plug-in, the crawler loads the function extension plug-in to a corresponding position of the crawler according to the position information carried by the function extension plug-in, and when the crawler executes the task to the position where the function extension plug-in is located, the crawler executes the function extension plug-in to realize the extension of the crawler function.
Before the webpage crawling is executed, a crawling condition is expanded through a plug-in 1, namely that only the sports section of the A website is crawled; after the webpage crawling is executed, jumping of the crawling website is performed through the plug-in 2. The crawler can run the plug-ins one by one from beginning to end, namely, the function expansion plug-ins at the corresponding positions are run when the plug-ins are run to the corresponding positions.
Optionally, the found function extension plug-ins are a plurality of function extension plug-ins, and the calling unit includes: the identification module is used for identifying the priority information of the plurality of function expansion plug-ins; and the calling module is used for calling the corresponding function expansion plug-ins according to the information in the sequence from high priority to low priority.
The function expansion plug-in carries a priority mark, obtains the priority mark of the corresponding function expansion plug-in, the priority mark can be represented by a number, and a plurality of function expansion plug-ins at the same position are sequentially executed according to the sequence of the priority from high to low. And if a certain function expansion plug-in does not have the priority identification, executing the function expansion plug-in as the last function expansion plug-in.
The order of the function expansion plug-ins can be controlled by adopting the priority identification, so that the crawler function can be customized. The function expansion is realized by changing the function expansion plug-in under the appointed directory, and the expanded function can be adjusted by setting the priority of the function expansion plug-in, so that the customization and adjustment of the crawler function are realized, and the flexibility of the crawler function expansion is improved.
Optionally, the loading module includes: the comparison submodule is used for comparing whether the searched function expansion plug-in is consistent with the function expansion plug-in the crawler; the updating submodule is used for updating the function expansion plug-in the crawler when the comparison submodule compares that the functions of the function expansion plug-in are inconsistent, wherein: when the first function expansion plug-in of the searched function expansion plug-ins does not exist in the crawler, the updating sub-module loads the first function expansion plug-in to the crawler; and when the second function expansion plug-in the crawler is not stored in the searched function expansion plug-in, the updating sub-module deletes the second function expansion plug-in.
When the crawler starts a crawling task, the function extension plug-ins under the appointed directory are loaded, when the function extension plug-ins are loaded in the crawler, whether the function extension plug-ins in the crawler are consistent with the function extension plug-ins under the appointed directory or not is compared, and when the function extension plug-ins are not consistent, the function extension plug-ins loaded in the crawler are adjusted according to the content of the difference. Optionally, when the function extension plug-in the specified directory is not loaded to the crawler, loading the function extension plug-in to the crawler; and when the function extension plug-in loaded in the crawler is not in the specified directory, deleting the function extension plug-in loaded in the crawler, thereby realizing the automatic update of the function extension plug-in the crawler. When the functions of the crawler need to be expanded, only the function expansion plug-ins under the specified directory are modified, one or more function expansion plug-ins can be modified simultaneously, one or more function expansion plug-ins can be loaded simultaneously, expansion of one or more functions is achieved, the crawler does not need to be modified when the functions are expanded, and the technical problem that the crawler is complex to expand is solved.
The above-mentioned serial numbers of the embodiments of the present application are merely for description and do not represent the merits of the embodiments.
In the above embodiments of the present application, the descriptions of the respective embodiments have respective emphasis, and for parts that are not described in detail in a certain embodiment, reference may be made to related descriptions of other embodiments.
In the embodiments provided in the present application, it should be understood that the disclosed technology can be implemented in other ways. The above-described embodiments of the apparatus are merely illustrative, and for example, a division of a unit may be a division of a logic function, and an actual implementation may have another division, for example, a plurality of units or components may be combined or may be integrated into another system, or some features may be omitted, or may not be executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, units or modules, and may be in an electrical or other form.
The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.
In addition, functional units in the embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit.
The integrated unit, if implemented in the form of a software functional unit and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present application may be substantially or partially implemented in the form of a software product stored in a storage medium, which includes a plurality of computer program modules (e.g., computer programs) specified to enable a computer device (e.g., a personal computer, a server, or a network device) to perform all or part of the steps of the method described in the embodiments of the present application. And the aforementioned storage medium includes: a U-disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a removable hard disk, a magnetic or optical disk, and other various media capable of storing program codes.
The foregoing is only a preferred embodiment of the present application and it should be noted that those skilled in the art can make several improvements and modifications without departing from the principle of the present application, and these improvements and modifications should also be considered as the protection scope of the present application.

Claims (6)

1. A method for expanding crawler functions, comprising:
loading a function extension plug-in under a specified directory when a crawler starts a crawling task;
identifying the starting condition of the loaded function extension plug-in;
when the starting condition is met, calling the function extension plug-in to execute the function of the function extension plug-in;
wherein, when the crawler starts the crawling task, the function extension plug-in unit under the appointed directory is loaded and comprises: when a crawler starts a crawling task, searching for a function extension plug-in meeting a preset interface rule and a preset name under a specified directory; loading the searched function expansion plug-in to the crawler; the preset interface rule is a communication rule between the crawler and the function extension plug-in;
wherein loading the found function extension plug-ins to the crawler includes:
comparing whether the searched function extension plug-ins are consistent with the function extension plug-ins in the crawler;
when the two modules are inconsistent, updating the function extension plug-in the crawler, wherein:
when the first function extension plug-in of the searched function extension plug-ins does not exist in the crawler, loading the first function extension plug-in to the crawler;
and deleting the second function extension plug-in when the comparison shows that the second function extension plug-in the crawler does not exist in the searched function extension plug-in.
2. The method of claim 1,
loading the found function extension plug-ins to the crawler includes: extracting the position information carried by the searched function expansion plug-in; loading the searched function expansion plug-in to a position corresponding to the position information in the crawler;
identifying the enabling conditions of the loaded function extension plug-in includes: judging whether the crawler executes to the position of the searched function expansion plug-in;
when the enabling condition is met, calling the function extension plug-in to execute the function of the function extension plug-in comprises the following steps: and when the crawler is judged to execute the position of the searched function expansion plug-in, executing the function expansion plug-in at the position.
3. The method according to claim 2, wherein the found function extension plug-ins are a plurality of function extension plug-ins, and invoking the function extension plug-ins to execute the functions of the function extension plug-ins comprises:
information identifying priorities of the plurality of function extension plug-ins;
and calling corresponding function expansion plug-ins according to the information in the order from high priority to low priority.
4. An expansion device of crawler function, comprising:
the crawler crawling task management system comprises a loading unit, a crawling task execution unit and a crawling task execution unit, wherein the loading unit is used for loading a function expansion plug-in under a specified directory when a crawler starts a crawling task;
the identification unit is used for identifying the starting condition of the loaded function extension plug-in;
the calling unit is used for calling the function extension plug-in to execute the function of the function extension plug-in when the starting condition is met;
wherein the loading unit includes: the searching module is used for searching the function extension plug-ins meeting the preset interface rules and the preset names under the specified directory when the crawler starts the crawling task; the loading module is used for loading the searched function expansion plug-in to the crawler; the preset interface rule is a communication rule between the crawler and the function extension plug-in;
wherein the loading module comprises:
the comparison submodule is used for comparing whether the searched function expansion plug-in is consistent with the function expansion plug-in the crawler;
an updating submodule, configured to update the function extension plug-in the crawler when the comparison submodule compares that the function extension plug-ins are inconsistent, where:
when the first function expansion plug-in of the searched function expansion plug-ins does not exist in the crawler, the updating sub-module loads the first function expansion plug-in to the crawler;
and when the comparison shows that the second function expansion plug-in the crawler does not exist in the searched function expansion plug-in, the updating sub-module deletes the second function expansion plug-in.
5. The apparatus of claim 4,
the searching module comprises: the extraction submodule is used for extracting the position information carried by the searched function expansion plug-in; the loading submodule is used for loading the searched function expansion plug-in to a position corresponding to the position information in the crawler according to the position information;
the identification unit is also used for judging whether the crawler is executed to the position where the searched function expansion plug-in is located;
the calling unit is further used for executing the function extension plug-in at the position when the crawler is judged to be executed to the position where the searched function extension plug-in is located.
6. The apparatus according to claim 5, wherein the found function extension plug-in is a plurality of function extension plug-ins, and the invoking unit includes:
the identification module is used for identifying the priority information of the plurality of function expansion plug-ins;
and the calling module is used for calling the corresponding function extension plug-ins according to the information in the sequence from high priority to low priority.
CN201510625057.5A 2015-09-25 2015-09-25 Crawler function expansion method and device Active CN106557495B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201510625057.5A CN106557495B (en) 2015-09-25 2015-09-25 Crawler function expansion method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201510625057.5A CN106557495B (en) 2015-09-25 2015-09-25 Crawler function expansion method and device

Publications (2)

Publication Number Publication Date
CN106557495A CN106557495A (en) 2017-04-05
CN106557495B true CN106557495B (en) 2020-05-22

Family

ID=58415298

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201510625057.5A Active CN106557495B (en) 2015-09-25 2015-09-25 Crawler function expansion method and device

Country Status (1)

Country Link
CN (1) CN106557495B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113391852B (en) * 2021-06-07 2024-06-04 广州通达汽车电气股份有限公司 Platform software expansion method and device

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102930059A (en) * 2012-11-26 2013-02-13 电子科技大学 Method for designing focused crawler
CN104750804A (en) * 2015-03-24 2015-07-01 南京途牛科技有限公司 Plug-in type configurable vertical network spider implementation method

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20120078875A1 (en) * 2010-09-27 2012-03-29 Michael Price Web browser contacts plug-in

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102930059A (en) * 2012-11-26 2013-02-13 电子科技大学 Method for designing focused crawler
CN104750804A (en) * 2015-03-24 2015-07-01 南京途牛科技有限公司 Plug-in type configurable vertical network spider implementation method

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
Nutch插件机制分析;RZ.M;《https://blog.csdn.net/ruizema/article/details/6679220》;20110811;第1-16页 *
基于可视化检索的广告信息增强系统的设计与实现;刘晓慧;《中国优秀硕士学位论文全文数据库 信息科技辑》;20140415(第04期);第69-74页 *

Also Published As

Publication number Publication date
CN106557495A (en) 2017-04-05

Similar Documents

Publication Publication Date Title
KR101736650B1 (en) Method and embedded device for loading driver
CN103475687B (en) Distributed method and system for download site data
US10489591B2 (en) Detection system and method thereof
CN105760184B (en) A kind of method and apparatus of charging assembly
CN103559065B (en) Method and system for OTA (Over-the-Air Technology) upgrade
CN107133165B (en) Browser compatibility detection method and device
CN107092652B (en) Navigation method and device for target page
CN107480117B (en) Recovery method and device for automatic page table single data
CN109885744A (en) Web data crawling method, device, system, computer equipment and storage medium
JP2014130547A (en) File management program, file management device and file management method
CN105786805A (en) Intelligent mobile terminal, document manager and file display method of same
CN108984184A (en) A kind of software installation method, device and electronic equipment, storage medium
CN105573788B (en) The method and apparatus of patch processing and the method and apparatus for generating patch
CN106933591A (en) The method and device that code merges
CN110941779A (en) Page loading method and device, storage medium and electronic equipment
CN110737458A (en) code updating method and related device
CN106557495B (en) Crawler function expansion method and device
CN105573756A (en) Script language extension method and event bus framework
CN105893089B (en) A kind of packaging method of Linux command row
CN105243141A (en) APP resource management method and mobile terminal
CN105095416B (en) A kind of method and apparatus realizing content in the search and promoting
CN105574097B (en) The loading method and device of video download class search results pages
CN114936269A (en) Document searching platform, searching method, device, electronic equipment and storage medium
CN107679168A (en) A kind of targeted website content acquisition method based on java platforms
US10977282B2 (en) Generating device, generating method, and non-transitory computer-readable recording medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
CB02 Change of applicant information

Address after: 100083 No. 401, 4th Floor, Haitai Building, 229 North Fourth Ring Road, Haidian District, Beijing

Applicant after: Beijing Guoshuang Technology Co.,Ltd.

Address before: 100086 Cuigong Hotel, 76 Zhichun Road, Shuangyushu District, Haidian District, Beijing

Applicant before: Beijing Guoshuang Technology Co.,Ltd.

CB02 Change of applicant information
GR01 Patent grant
GR01 Patent grant