CN109597927B - Method and system for extracting page information of bid-inviting and bidding related webpage - Google Patents

Method and system for extracting page information of bid-inviting and bidding related webpage Download PDF

Info

Publication number
CN109597927B
CN109597927B CN201811481859.3A CN201811481859A CN109597927B CN 109597927 B CN109597927 B CN 109597927B CN 201811481859 A CN201811481859 A CN 201811481859A CN 109597927 B CN109597927 B CN 109597927B
Authority
CN
China
Prior art keywords
information
time
enterprise
bid
user
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201811481859.3A
Other languages
Chinese (zh)
Other versions
CN109597927A (en
Inventor
李正军
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guiyang Gaoxin Ston Information Co ltd
Original Assignee
Guiyang Gaoxin Ston Information Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guiyang Gaoxin Ston Information Co ltd filed Critical Guiyang Gaoxin Ston Information Co ltd
Priority to CN201811481859.3A priority Critical patent/CN109597927B/en
Publication of CN109597927A publication Critical patent/CN109597927A/en
Application granted granted Critical
Publication of CN109597927B publication Critical patent/CN109597927B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Abstract

The invention relates to the field of network information acquisition, in particular to a method and a system for extracting page information of a bid related webpage, wherein the method comprises the following steps: s1: automatically acquiring a position node webpage of the webpage in which the related information is located according to keywords related to bid inviting and bid winning; s2: searching a father node webpage shared by the keywords according to the acquired position node webpage; s3: judging whether the acquired father node webpage is acquired or not, if the father node webpage is not acquired, judging the content arrangement mode of the father node webpage and then crawling information according to the arrangement mode; s4: and storing and displaying the crawled bid inviting information and the bid information. The proposal realizes the automatic crawling of the bid inviting information and the bid winning information, and is convenient for bidders to obtain the bid inviting information and the bid winning information in time.

Description

Method and system for extracting page information of bidding related webpage
Technical Field
The invention relates to the field of network information acquisition, in particular to a method and a system for extracting page information of a bidding related webpage.
Background
The bid-inviting and bidding is a transaction mode adopted when large goods are bought and sold under market economic conditions, engineering construction projects are contracted and contracted, and service projects are purchased and provided. In this transaction, a buyer who buys a project (including purchase of goods, contract of a project, and purchase of services) is generally used as a tenderer, and by issuing tendering bulletin or tendering invitation to a certain number of specific suppliers and contractors, the nature, quantity, quality, and technical requirements of the project to be purchased, delivery date, completion time, and time to provide services are provided, and tendering conditions such as qualification requirements of other suppliers and contractors indicate that the buyer and contractor who most satisfy the purchasing requirements should be selected to enter into a contract, and the buyer and contractors should be provided with quotes for purchasing the goods, projects, or services and other conditions for responding to the bidding requirements, and participate in bidding competition. After the tenderer examines and compares the quoted price of each bidder and other conditions, the bidder selects the successful bidder preferentially and signs a purchase contract with the successful bidder.
The development of informatization brings a new situation in the field of bidding, and originally, a manner that bidders mainly obtain item bidding information through periodicals and magazines is changed into a manner that information suitable for self bidding is obtained through an internet website. One way for bidders is to log in various bidding websites in various regions to acquire information, and then manually search and examine the required information one by one. Another more efficient method is to log in some large bidding information websites and search the required bidding information in a full-text retrieval manner, so that the whole process is time-consuming and labor-consuming, and the problem that the information is not obtained timely frequently occurs; when the bid inviting information or the bid intermediate information is released, a bidder needs to click to check the bid inviting information or the bid intermediate information, but if the number of bidders is large, the website of the bid inviting enterprise is broken down, and the bid intermediate information cannot be obtained in time.
Disclosure of Invention
The invention aims to provide a method for extracting page information of a bid inviting and bidding related webpage, which is used for solving the problem that the existing bidder cannot find bid inviting information or obtain bid inviting information in time.
The basic scheme provided by the invention is as follows: the method for extracting the page information of the bid related webpage comprises the following steps:
s1: automatically acquiring a position node webpage of the webpage in which the related information is located according to keywords related to bid inviting and bid winning;
s2: searching a father node webpage shared by the keywords according to the acquired position node webpage;
s3: judging whether the acquired father node webpage is acquired or not, if the father node webpage is not acquired, judging the arrangement mode of the father node webpage content and then crawling information according to the arrangement mode;
s4: and storing and displaying the crawled bid inviting information and the bid information.
The invention has the advantages that:
1. when a bidder pays attention to bid inviting information and medium bid information of multiple enterprise websites, the bidder only needs to acquire all the bid inviting information and medium bid information to be acquired through the scheme without entering each website for enterprise information publishing to see;
2. if the number of people concerned by a website for posting the bid inviting enterprise information is large, the condition that the webpage is crashed and cannot be loaded after the enterprise publishes the information can be caused, when the scheme is adopted, a bidder does not need to enter the website corresponding to the enterprise information posting, the bid inviting information or the bid inviting information only needs to be automatically acquired by using the method, and compared with manual acquisition, the information acquisition is more convenient and faster.
Further, in steps S1-S4, the fetching of the bid information and the bid information is completed by the main server allocating the corresponding information fetching sub-server according to the allocation criteria.
Because the websites published by the enterprise information needing to be crawled are more, the websites involved in crawling of bid inviting information and bid information are more, the information crawling sub-servers needed are more, and the main server distributes the information crawling sub-servers to execute information crawling work according to the distribution criteria, so that the information missing crawling or repeated crawling is avoided.
Further, when the distribution criterion is generated, firstly, the updating time of the information of the internet sites of the enterprise is obtained, then the updating time of the network information of each enterprise is sequenced in sequence, meanwhile, the sequencing is carried out according to the average number of visitors of each enterprise, if the updating time of the network information of a plurality of enterprise websites is the same, the information capturing instruction is preferentially executed on the enterprise websites with the average number of visitors each day, if the updating time of the network information of each enterprise website is different, the information capturing instruction is executed according to the sequence of the updating time of the enterprise websites, and each information capturing sub-server is sequentially arranged and executed according to the execution information capturing instruction time.
The network information updating time on the enterprise website is possibly inconsistent, so that the network information updating time of the enterprise website is taken as one of the distribution criterion factors, and the updated information of the corresponding enterprise website can be conveniently and timely obtained. And the average number of visitors per day of each enterprise website is inconsistent, and updated information on the enterprise website with the large number of visitors is more difficult to acquire, so that the average number of visitors per day of the enterprise website is taken as one of the factors of the distribution criteria, and when the information is updated simultaneously by a plurality of enterprise websites, the updated information of the enterprise website with the large average number of visitors per day is preferentially acquired, so that the condition that the updated information of the enterprise website is acquired too late is avoided.
Further, in step S3, after the bid inviting information and the bid information are crawled, the randomly extracted bid inviting information and the bid information are verified for correctness.
The information obtained by crawling is randomly extracted for correct and error verification, so that the correct and error rate of the crawling information can be mastered preliminarily, and the optimization of the crawling method can be performed by system testers.
In addition, aiming at the method for extracting the page information of the bid-inviting related webpage, the system for extracting the page information of the bid-inviting related webpage, which applies the method, is provided, and comprises the following steps: the system comprises a user terminal, a main server and a plurality of information capturing sub-servers;
the user terminal is used for registering, logging in, paying attention to and subscribing bidding information and bidding information by a user;
the main server is used for generating a distribution criterion of the information capturing sub-servers, and then capturing the bid inviting information and the bid winning information in the corresponding enterprise website according to the distribution criterion distribution information capturing sub-servers.
By adopting the system, the bid inviting information and the medium bid information can be automatically captured, bidders who need to check the bid inviting information and the medium bid information of different enterprises can check the bid inviting information and the medium bid information uniformly through the system, and the system is not required to find the bid inviting information and the medium bid information on each enterprise website, and is convenient to use.
Further, the general server includes user classification and limits the module, user classification and limits the module and is used for carrying out the authority division to the user of registering, divide into ordinary user, file personnel user and system tester user, and ordinary user can carry out the access information after purchasing the member and read, and file personnel user can not only read the access information, can also write the access information, and system tester user can carry out information reading, write and software test.
The authority of the registered user is divided, so that the management work of the system is convenient to realize.
Drawings
FIG. 1 is a logic block diagram of a bid-related web page information extraction system according to an embodiment of the present invention.
Detailed Description
The following is further detailed by way of specific embodiments:
as shown in fig. 1, the bid related web page information extraction system includes: the information management subsystem and the distribution model generation subsystem, wherein the information management subsystem comprises: the system comprises a user terminal, a main server and a plurality of information capturing sub-servers, wherein the user terminal and the information capturing sub-servers are in wireless communication with the main server through a wireless communication module.
1. User terminal
And the login registration module is used for registering or logging in different users according to the registration information or the login information, the users comprise common users and management users, and the management users comprise system testing personnel users and document personnel users.
And the account setting module is used for filling and setting personal information of the user.
And the member purchasing module is used for purchasing members by the common user.
And the setting module is used for the user to modify the password and feed back the question.
And the bid inviting information searching module is used for searching and checking bid inviting information by a user.
And the bidding information viewing module is used for viewing, paying attention to and subscribing different types of bidding information.
And the information verification module is used for checking the bidding information and the medium information sent by the main server by the document staff user.
2. Master server
And the database is internally provided with parent node webpages captured by the information capturing sub-servers.
The user classification limiting module is used for carrying out authority division on different registered users, common users can read access information after purchasing members, document personnel users can read the access information and write the access information, and system testing personnel users can read, write and test the information.
The distribution model generation subsystem in this embodiment is located in the main server, and the distribution model generation subsystem includes a sub-server distribution module, and the sub-server distribution module is configured to generate a distribution criterion for the information capture sub-servers, and distribute different information capture sub-servers according to the distribution criterion to execute an information capture instruction according to a distribution rule. When the distribution rule is generated, firstly, the updating time of the network information of the enterprise websites is obtained, then the updating time of the network information of each enterprise website is sequenced in sequence, meanwhile, the sequencing is carried out according to the average number of visitors per day of each enterprise website, if the updating time of the network information of a plurality of enterprise websites is the same, the information capturing instruction is preferentially executed on the enterprise websites with the average number of visitors per day, if the updating time of the network information of each enterprise website is different, the information capturing instruction is executed according to the sequence of the updating time of the enterprise websites, and each information capturing sub-server is sequentially arranged and executed according to the execution information capturing instruction time.
The extraction verification module is used for receiving bid winning information or bid inviting information from the keyword value judgment module, then randomly extracting a preset amount of bid inviting information and sending the bid inviting information and the bid inviting information to a user terminal corresponding to a document staff user.
3. Information capturing sub-server
And the information grabbing module is used for grabbing the website bid-inviting and bid-winning information according to the distribution of the main server, loading page information of the bid-inviting and bid-winning html when grabbing the website information, then automatically acquiring the position node webpage of the webpage where the related information is located according to keywords such as 'bid-inviting', 'bid-winning' and the like, and then searching the nearest father node webpage shared by the keywords according to the acquired position node. If the latest father node webpage shared by the keywords is not found, if the acquired node webpage acquired initially is the home page, the latest father node webpage is not acquired any more, and the initially acquired node webpage is executed as the father node webpage.
The keyword value judging module is used for judging whether the father node webpage captured by the information capturing module is stored in the database, if the father node webpage is stored in the database, bid information or medium bid information corresponding to the father node webpage is acquired, if the father node webpage is not stored in the database, whether the content arrangement rule in the father node webpage is vertically or horizontally arranged is judged, if the judgment result is horizontally arranged, the bid information or medium bid information corresponding to the father node webpage is horizontally acquired, and if the judgment result is vertically arranged, the bid information or medium bid information corresponding to the father node webpage is vertically acquired. In addition, the keyword judgment module sends the information to the main server after acquiring bid inviting information or bid information.
Aiming at the system for extracting the page information of the relevant webpage for tendering and bidding, the scheme also discloses a method for extracting the page information of the relevant webpage for tendering and bidding, which comprises the following implementation processes:
s1: automatically acquiring a position node webpage of the webpage in which the related information is located according to keywords related to bid inviting and bid winning;
s2: searching a father node webpage shared by the keywords according to the acquired position node webpage;
s3: judging whether the acquired father node webpage is acquired or not, if the father node webpage is not acquired, judging the content arrangement mode of the father node webpage and then crawling information according to the arrangement mode;
s4: and storing and displaying the crawled bid inviting information and the bid information.
In steps S1-S4, the fetching of bid inviting information and bid information is completed by the main server distributing the corresponding information fetching sub-server according to the distribution criteria. When the distribution rule is generated, firstly, the updating time of the network information of the enterprise websites is obtained, then the updating time of the network information of each enterprise website is sequenced in sequence, meanwhile, the sequencing is carried out according to the average number of visitors per day of each enterprise website, if the updating time of the network information of a plurality of enterprise websites is the same, the information capturing instruction is preferentially executed on the enterprise websites with the average number of visitors per day, if the updating time of the network information of each enterprise website is different, the information capturing instruction is executed according to the sequence of the updating time of the enterprise websites, and each information capturing sub-server is sequentially arranged and executed according to the execution information capturing instruction time. In step S3, after the bid inviting information and the bid information are crawled, the randomly extracted bid inviting information and the bid information are verified for correctness.
Example two
The difference between the second embodiment and the first embodiment is that the distribution model generation subsystem in the second embodiment includes: the system comprises a user terminal, a management terminal and a main server. The user terminal and the management terminal are both in network communication with the main server through the existing WIFI module, and the user terminal and the management terminal can be both selected from existing mobile phones or computers. The user terminals used by the distribution model generation subsystem and the information management subsystem are the same device, and the total server used by the distribution model generation subsystem and the information management subsystem is the same device.
1. User terminal
The user terminal includes:
and the attention requirement filling module is used for inputting an information set which is concerned by the user and sending the information set input by the user to the main server. The information set includes the name of the business that the user wants to focus on and subscribe to, and the keywords of the information content that focus on.
2. General server
The overall server includes:
and the database is used for storing all data generated and received by the main server and establishing a user information storage module for each user.
The enterprise website number of visiting people acquisition module is used for acquiring the total number of visiting people of the enterprise website in the last year from each enterprise website, then calculating the daily average number of visiting people according to the total number of visiting people of the corresponding enterprise website in the last year, and then sequencing according to the daily average number of visiting people of each enterprise to generate an enterprise website daily average number of visiting people information sheet. The information sheet of the number of the daily average visitors of the enterprise websites comprises the name of the enterprise and the number of the daily average visitors of the corresponding enterprise, the enterprise websites with a large number of the daily average visitors are arranged in front of the information sheet, and the enterprise websites with a small number of the daily average visitors are arranged behind the information sheet.
The enterprise website visitor number recording module is used for acquiring and recording the number of visitors of each enterprise website in each hour every day, then respectively generating a line graph which changes along with time for the number of visitors of each enterprise website in each hour every day, then analyzing the change rule from the peak time of the number of visitors to the valley time of the number of visitors of each enterprise website every day, then judging whether the change rule from the peak time of the number of visitors to the valley time of visitors of the same enterprise website is consistent or not on different dates, if so, generating the daily visit time recording information corresponding to the enterprise website, and if not, analyzing the change rule from the peak time of visitors to the valley time of visitors of the enterprise website on different dates to the weekdays of the enterprise website by taking the week as a unit, and generating the weekly visit time recording information. The daily visit time record information comprises information of the peak number and the valley period of the visit people of the enterprise website in the previous day, and the weekly visit time record information comprises information of the change of the visit people of each day in the week, information of the peak number and the valley period of the visit people of the enterprise website in each day, information of the change rule of the visit people of the enterprise website from Monday to Monday in the week, and information of the change rule of the peak number and the valley period of the visit people.
The enterprise website information publishing time acquisition module is used for acquiring the information updating time of the enterprise website every day, searching the webpage information published by the corresponding enterprise website through keywords such as 'bid invitation' and 'bid winning', crawling the bid publishing time according to the keywords such as 'publishing time' and 'publishing time', and generating the enterprise information updating time information. The enterprise information updating time information in the scheme comprises the information updating time and the information publishing time of the enterprise website every day. The enterprise website information publishing time acquisition module is also used for marking the time axis with days as a unit according to the information updating time of different enterprise websites every day, marking the enterprise websites updating information at the same time point, marking the bid-winning disclosing time on the calendar, and then generating a time information recording table according to the marked information updating time and the bid-winning disclosing time every day.
And the user information reference rule acquisition module is used for acquiring the information rule record list viewed by the user. When the information rule record table for checking the information by the user is obtained, firstly, the time for each user to log in the login registration module to check the information and the time for checking the corresponding content every day are obtained, then, a user information check rule record table is generated for each user, and the user information check rule table of each user comprises: the rule of daily login time (or called user habit login time, including the rule of the first time the user logs in the system every day, the rule of the second time the user logs in the system every day and the rule of the third time the user logs in the system every day), the content viewing, the time corresponding to each enterprise content viewing and the sequencing of the enterprise content viewing.
And the distribution model generation module is used for generating a distribution model according to the daily average visitor number information sheet, the daily visit time record information, the weekly visit time record information, the time information record table and the user check information rule table of the enterprise website, and distributing corresponding information capturing sub-servers according to the distribution model to execute the information crawling instruction.
And when the distribution model is generated, dividing all enterprise websites concerned by the user into three types according to the enterprise information updating time information. The first type is that the information updating time of the enterprise website every day is before the first time that a user logs in the system every day, and the information crawling time of the enterprise website is the time from the information updating time of the enterprise website every day to the first time that the user logs in the system every day; the second type is that the information updating time of the enterprise website every day is between the time when the user logs in the system for the first time every day and the time when the user logs in the system for the second time every day, and the information crawling time of the enterprise website is the time period when the information updating time of the enterprise website every day and the time when the user logs in the system for the second time every day; the third type is that the information updating time of the enterprise website every day is between the time when the user logs in the system twice every day and the time when the user logs in the system for the third time every day, and the information crawling time of the enterprise website is the time when the enterprise website updates the information every day and the time when the user logs in the system for the third time every day.
For enterprise websites belonging to the same type, daily login time of a user and daily visit time record information or weekly visit time record information corresponding to enterprise websites which the user pays attention to and often browses (enterprise names filled in user information in a centralized manner and enterprise websites corresponding to the view content recorded in an information viewing rule table by the user) are sequentially compared according to the sequence of the enterprise contents viewed by the user, the visit number low valley period (called as optimal information crawling time) between the daily information updating time of the enterprise websites (including daily information updating time of the enterprise websites corresponding to the bid winning open information time) and the daily login time of the user is confirmed, and corresponding information crawling sub-servers are arranged in the optimal information crawling time to crawl bid information and medium information.
If the optimal information crawling time budgets of the multiple enterprise websites concerned by the user are the same, and the average daily visitors obtained by the information sheets of the average daily visitors of the enterprise websites corresponding to the multiple enterprise websites are the same, arranging idle information capturing sub-servers in sequence according to the sequence of the enterprise contents checked by the user for crawling; if the best information crawling time budgets of a plurality of enterprise websites concerned by the users are the same, and the daily average number of visitors obtained by the daily average number of visitors of the enterprise websites corresponding to the plurality of enterprise websites is different, the order of ranking the enterprise websites in the daily average number of visitors information list of the enterprise websites (the enterprise with the largest number of visitors is ranked in the front) and the order of viewing the enterprise contents by the users (the enterprise watched by the users is ranked in the front) are preferably arranged for crawling the enterprise website with the enterprise content ranked in the first order, then the enterprise website with the enterprise website ranked in the first order is arranged for crawling the enterprise website with the daily average number of visitors information list of the enterprise website, then the crawling user is arranged for viewing the enterprise website with the second order, and so on (once an enterprise website has been accessed, the ranking position of the enterprise website in the two ranking sequences is invalid, and the crawling of the enterprise website after that enterprise website is not considered again). Such as: the optimal information crawling time of four enterprise websites A, B, C and D concerned by the same user is in the same time period, and the arrangement sequence of the optimal information crawling time in the information sheet of the daily number of visiting people of the enterprise websites is A-B-C-D; and if the arrangement sequence of the four enterprise websites in the information rule table viewed by the user is C-A-D-B, preferentially arranging the enterprise website information crawling corresponding to the enterprise C, then arranging the enterprise website corresponding to the enterprise A, then arranging the enterprise website corresponding to the enterprise D, and finally arranging the enterprise website corresponding to the enterprise B.
In addition, if the time of the user logging in the system a day is the same as the time of the user viewing the information rule table, for example, the time of the user logging in the system a first time every day is later, and the user suddenly needs to know bid information or bid information disclosed by an enterprise, so that the time of the user logging in the system a first time is much earlier than the time of the user logging in the system a first time, according to the rules of the first step and the second step, even if the time of the enterprise website update information is before the user logs in the system a first time, because the low valley period of the number of visitors determined in the second step is not between the time of the enterprise website update information and the time of the user logging in the system a first time every day, the bid information or the bid information published in the enterprise website which the user pays attention to and frequently browses is not crawled yet, once the time of the user logging in the system a first time every day is before the time of the user logging in the first time every day, the corresponding time of the user logging in the enterprise website update information or the bid information is obtained from the corresponding time of the enterprise website, and the corresponding difference value of the bid information obtained from the time of the user logging in the first time of the enterprise website. When bid inviting information or bid inviting information is obtained, information crawling time of each enterprise website is sequenced according to sequencing of enterprise content viewed by a user of the user in an information viewing record table, namely information crawled on an enterprise website viewed by a user firstly is also preferentially distributed to an information crawling sub-server in crawling.
3. Information capturing sub-server
The information capturing sub-server comprises:
and the information crawling module is used for receiving the information crawling instruction sent by the main server and then crawling bid inviting information or medium bid information on a corresponding enterprise website after receiving the instruction.
In addition, for the distribution model generation subsystem, the present embodiment further provides a bid distribution model generation method, which is described in the present embodiment by way of example, and it is assumed that information input by a user collectively represents bid information that the user wants to pay attention to companies a and b, an enterprise website corresponding to the company a updates information at nine points in the morning each day, the number of visitors is the largest in one hour from nine points to ten points in the morning each day, and the number of visitors in the following time period is gradually reduced; the enterprise corresponding to company B updates information every day at eight am, but the number of visiting persons is small between eight am and nine am every day, the number of visiting persons is large between nine am and eleven am every day, and the number of visiting persons is consistent in the rest of time. And the daily average visit amount of the enterprise website of the company A is more than that of the company B, the user is used to check whether the bidding information is updated every eleven am, then the user does not check the bidding information, and the corresponding bidding information of the company A is checked firstly and then the corresponding bidding information of the company B is checked in each checking.
The specific implementation steps are as follows:
s1: the user fills out a set of information that includes filled out companies that want to be attended to as company a and company b.
S2: and the head server acquires the information sheets of the number of the daily average visitors of the enterprise websites corresponding to the company A and the company B, the daily visiting time recording information and the enterprise information updating time information according to the information sets. The first company and the second company in the acquired daily visit number information list of the enterprise website are arranged in front of each other; the information about the fact that the number of visitors is the largest in one hour from nine to ten points in the morning every day and the number of visitors is gradually reduced in the later time period is recorded in the daily visit time record information acquired from the enterprise website corresponding to the company A, and the information about the fact that the number of visitors is small from eight to nine points in the morning, the number of visitors is large from nine to eleven points in the morning and the number of visitors is consistent in the rest time is recorded in the daily visit time record information acquired from the enterprise website corresponding to the company B.
S3: and the main server updates information on enterprise websites corresponding to the company A and the company B every day and crawls related content of bid inviting information. Therefore, no matter which time period the user views, as long as the enterprise website updates the information and the information crawling is successful, the user can view the corresponding crawling information.
S4: and the main server acquires the time for the user to log in the information system and check the information every day, and generates a user check information rule table. And the user checks the information rule table to record that the time when the user logs in the system for the first time every day is ten o' clock in the morning.
S5: and the main server generates a distribution model according to the daily average number of visiting people information sheets, the daily visiting time recording information, the time information recording table and the user viewing information rule table of the enterprise website. When the distribution model is generated, the two enterprise websites belong to any one of the three types, and the judgment results of the two enterprise websites are the first type because the information updating time of each day of the two enterprise websites is before the time when the user logs in the system for the first time every day. And secondly, judging the optimal information crawling time of the two enterprises, wherein the judgment result shows that the optimal access time of the company A is between ten and eleven am, the optimal access time of the company B is between eight and nine am, and the optimal information crawling time of the two enterprises is different, and respectively arranging the information crawling sub-servers to crawl the information in the optimal information crawling time range corresponding to the two enterprises.
EXAMPLE III
The difference between the third embodiment and the second embodiment is that the general server in the second embodiment further includes:
the information management module is used for marking the concerned enterprise name and the corresponding enterprise name of the source of the crawling information accessed by the user (namely the enterprise name recorded on the enterprise website) from the information set of each user, then counting how many users in all registered users concern, subscribe or look up the crawling information on the enterprise website, and generating a user concerning information record table.
The crawling information adjusting module is used for acquiring the time of the user logging in the system every day for the first time recorded in the information rule table viewed by all the users, sequencing the time to generate a user logging time arrangement table, judging which user logs in the system every day for the first time to be closest to the daily updating time of the enterprise website according to the user logging time arrangement table and the user attention information record table, and calling the user as a close user of the enterprise after the user logs in the system every day for the first time at the enterprise website every day for updating time, wherein the crawling of the enterprise website information is realized by executing a distribution model according to the information recorded in the information rule table viewed by the user corresponding to the close user, and crawling of the same information is not performed on the enterprise website after the bidding information or the medium-grade information corresponding to the enterprise website is acquired, namely the crawling of the bidding information or the medium-grade information published by the enterprise is performed during the time of the updating information of the enterprise website and the time of the user logging in the system every day for the first time.
Compared with the second embodiment, the third embodiment avoids the need of repeatedly crawling the bidding information and the bid winning information of the same enterprise website for each user when different users pay attention to the bidding information and the bid winning information published by the same enterprise website.
The foregoing are embodiments of the present invention and are not intended to limit the scope of the invention to the particular forms set forth in the specification, which are set forth in the claims below, but rather are to be construed as the full breadth and scope of the claims, as defined by the appended claims, as defined in the appended claims, in order to provide a thorough understanding of the present invention. It should be noted that, for those skilled in the art, without departing from the structure of the present invention, several variations and modifications can be made, which should also be considered as the protection scope of the present invention, and these will not affect the effect of the implementation of the present invention and the utility of the patent. The scope of the claims of the present application shall be determined by the contents of the claims, and the description of the embodiments and the like in the specification shall be used to explain the contents of the claims.

Claims (6)

1. The bid-attracting related webpage information extraction system is characterized in that: the method comprises the following steps: the system comprises a user terminal, a main server and a plurality of information capturing sub-servers;
the user terminal is used for registering, logging in, paying attention to and subscribing bidding information and bidding information by a user;
the enterprise website information publishing time acquisition module is used for acquiring the information updating time of each day of the enterprise website, searching the webpage information published by the corresponding enterprise website through the bidding keywords and the bid winning keywords, crawling the bid publishing time according to the publishing time keywords or the publishing time keywords and then generating enterprise information updating time information;
the main server is used for generating a distribution criterion of the information capturing sub-servers, and then capturing bid inviting information and bid winning information in the corresponding enterprise website according to the distribution criterion distribution information capturing sub-servers;
the user information reference rule acquisition module is used for acquiring a user viewing information rule recording table; when the information regular record list checked by the user is obtained, firstly, the time of logging in the checking information from the login registration module and the time of checking the corresponding content of each user every day are obtained, then, a user checking information regular record list is generated for each user, and the user checking information regular record list of each user comprises: the method comprises the steps of logging in time rules every day, checking content, checking time corresponding to each enterprise content and checking sequence of the enterprise content, wherein the logging in time rules every day comprise a time rule that a user logs in a system for the first time every day, a time rule that the user logs in the system for the second time every day and a time rule that the user logs in the system for the third time every day;
when the distribution model is generated, dividing all enterprise websites concerned by the user into three types according to enterprise information updating time information, wherein the first type is that the information updating time of each day of the enterprise websites is before the first time of logging in the system by the user each day, and the information crawling time of the enterprise websites is the time from the information updating time of each day of the enterprise websites to the first time of logging in the system by the user each day; the second type is that the information updating time of the enterprise website every day is between the time when the user logs in the system for the first time every day and the time when the user logs in the system for the second time every day, and the information crawling time of the enterprise website is the time period when the information updating time of the enterprise website every day and the time when the user logs in the system for the second time every day; the third type is that the information updating time of the enterprise website every day is between the time when the user logs in the system twice every day and the time when the user logs in the system for the third time every day, and the information crawling time of the enterprise website is the time period of the information updating time of the enterprise website every day and the time when the user logs in the system for the third time every day;
the distribution model is used for the enterprises which the user wants to pay attention to, wherein the enterprises at least comprise a company A and a company B, and the distribution model comprises the following contents:
s1: a user fills in an information set, wherein the information set comprises enterprise names which the user wants to pay attention to and subscribe and information content keywords which are paid attention to;
s2: the head server acquires information sheets of the number of the daily average visitors of enterprise websites corresponding to the company A and the company B, daily visiting time recording information and enterprise information updating time information according to the information sets; the first company and the second company in the acquired daily visit number information list of the enterprise website are arranged in front of each other; the method comprises the steps that relevant information that the number of visitors in one hour from nine to ten points in the morning is more than that in other time periods every day and the number of visitors in the later time periods is gradually reduced is recorded in daily visit time record information obtained from enterprise websites corresponding to company A, and relevant information that the number of visitors in eight to nine points in the morning is less than that in other time periods, the number of visitors in nine to eleven points is more than that in other time periods and the number of visitors in the rest time is consistent is recorded in daily visit time record information obtained from enterprise websites corresponding to company B;
s3: the main server updates information in enterprise websites corresponding to companies A and B every day and simultaneously crawls related content of bid inviting information;
s4: the main server acquires the time of logging in the information system and checking information every day by the user, generates a user checking information rule table, and records that the time of logging in the system for the first time every day by the user is ten o' clock in the morning in the user checking information rule table;
s5: the main server generates a distribution model according to an enterprise website daily average visitor number information list, daily visit time record information, a time information record table and a user check information rule table; when the distribution model is generated, firstly, judging which of the three types the enterprise website belongs to; and secondly, judging the optimal information crawling time of the two enterprises, and if the judgment result shows that the optimal access time of the company A is between ten and eleven am, the optimal access time of the company B is between eight and nine am, and the optimal information crawling time of the two enterprises is different, arranging the information crawling sub-servers to crawl information in the optimal information crawling time range corresponding to the two enterprises respectively.
2. The bid-related web page information extraction system of claim 1, wherein: the general server comprises a user classification limiting module, wherein the user classification limiting module is used for carrying out authority division on registered users and is divided into common users, file personnel users and system tester users, the common users can read access information after purchasing members, the file personnel users can read the access information and write the access information, and the system tester users can read, write and test software.
3. The method for extracting the page information of the bid-inviting and bidding related webpage is characterized by comprising the following steps: the bid-related web page information extraction system according to any one of claims 1-2, wherein: the method for extracting the page information of the bidding related webpage comprises the following steps: the method comprises the following steps:
s1: automatically acquiring a position node webpage of the webpage in which the related information is located according to keywords related to bid inviting and bid winning;
s2: searching a father node webpage shared by the keywords according to the acquired position node webpage;
s3: judging whether the acquired father node webpage is acquired or not, if the father node webpage is not acquired, judging the content arrangement mode of the father node webpage and then crawling information according to the arrangement mode;
s4: and storing and displaying the crawled bid inviting information and the bid information.
4. The bid-related web page information extraction method of claim 3, wherein: in steps S1-S4, the grabbing of the bid information and the bid information is completed by the main server distributing the corresponding information grabbing sub-servers according to the distribution criteria.
5. The method for extracting page information of a bid-inviting related web page according to claim 4, wherein: when the distribution rule is generated, firstly, the updating time of the network information of the enterprise websites is obtained, then the updating time of the network information of each enterprise website is sequenced in sequence, meanwhile, the sequencing is carried out according to the average number of visitors per day of each enterprise website, if the updating time of the network information of a plurality of enterprise websites is the same, the information capturing instruction is preferentially executed on the enterprise websites with the average number of visitors per day, if the updating time of the network information of each enterprise website is different, the information capturing instruction is executed according to the sequence of the updating time of the enterprise websites, and each information capturing sub-server is sequentially arranged and executed according to the execution information capturing instruction time.
6. The bid-related web page information extraction method of claim 3, wherein: in step S3, after the bid inviting information and the medium bid information are crawled, the randomly extracted bid inviting information and the medium bid information are verified for correctness.
CN201811481859.3A 2018-12-05 2018-12-05 Method and system for extracting page information of bid-inviting and bidding related webpage Active CN109597927B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201811481859.3A CN109597927B (en) 2018-12-05 2018-12-05 Method and system for extracting page information of bid-inviting and bidding related webpage

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201811481859.3A CN109597927B (en) 2018-12-05 2018-12-05 Method and system for extracting page information of bid-inviting and bidding related webpage

Publications (2)

Publication Number Publication Date
CN109597927A CN109597927A (en) 2019-04-09
CN109597927B true CN109597927B (en) 2022-11-18

Family

ID=65961182

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201811481859.3A Active CN109597927B (en) 2018-12-05 2018-12-05 Method and system for extracting page information of bid-inviting and bidding related webpage

Country Status (1)

Country Link
CN (1) CN109597927B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110417873B (en) * 2019-07-08 2021-04-02 上海鸿翼软件技术股份有限公司 Network information extraction system for realizing recording webpage interactive operation
CN110502680A (en) * 2019-08-27 2019-11-26 重庆大司空信息科技有限公司 A kind of abstracting method and device of acceptance of the bid bulletin relevant field

Family Cites Families (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR20020005534A (en) * 2001-11-07 2002-01-17 주식회사 한성정보통신 Tender information management system for electronic tender and tender service providing method using the system
US8712992B2 (en) * 2009-03-28 2014-04-29 Microsoft Corporation Method and apparatus for web crawling
CN102347930B (en) * 2010-07-26 2015-09-09 中国电信股份有限公司 Web page contents acquisition methods and system
CN103617225B (en) * 2013-11-25 2019-03-08 北京奇虎科技有限公司 A kind of associating web pages searching method and system
CN105069112A (en) * 2015-08-11 2015-11-18 浪潮软件集团有限公司 Industry vertical search engine system
CN105117501B (en) * 2015-10-09 2017-07-11 广州神马移动信息科技有限公司 Web crawlers dispatching method and apply its network crawler system
CN105912552A (en) * 2015-12-23 2016-08-31 乐视网信息技术(北京)股份有限公司 Method for capturing webpage video and terminal device for capturing webpage video
CN106960063A (en) * 2017-04-20 2017-07-18 广州优亚信息技术有限公司 A kind of internet information crawl and commending system for field of inviting outside investment
CN108563679A (en) * 2018-03-06 2018-09-21 广西友信矿业有限公司 Quarrying Information Acquisition System based on information collection and method

Also Published As

Publication number Publication date
CN109597927A (en) 2019-04-09

Similar Documents

Publication Publication Date Title
US6947906B1 (en) Method for conducting a computerized government auction
US20020161625A1 (en) Online media planning system
US20020022982A1 (en) Method and system for remotely managing business and employee administration functions
US20090228339A1 (en) Method and system for revenue per reverse redirect
US9619809B2 (en) Lead generation for content distribution service
KR20050100336A (en) Automatic advertiser notification for a system for providing place and price protection in a search result list generated by a computer network search engine
US20010051897A1 (en) Method and system for qualifying consumers for trade publication subscriptions
CN102428489A (en) Method and apparatus for managing content obtained by combining works and advertisements with public license
WO2008156786A1 (en) System and method for the collaborative solicitation of knowledge base content, services and products
US20120303418A1 (en) Dynamic pricing of access to content where pricing varies with user behavior over time to optimize total revenue and users are matched to specific content of interest
CN109597927B (en) Method and system for extracting page information of bid-inviting and bidding related webpage
CN106874321B (en) Pluggable data retrieval method and device
US20220180381A1 (en) Method of correlating bid price to intrinsic value in a survey platform
US20130275317A1 (en) Managing digital media objects
US20070198466A1 (en) By owner MLS business method
US20020055959A1 (en) Information transmitting and receiving method, information transmitting apparatus and information receiving apparatus, in which layout information is used
CN109670097B (en) Method and system for scheduling crawling tasks of bidding related web pages
US20220253885A1 (en) Verified Participant Database System for Surveys and Promotions
JP2012504270A (en) Method and system for managing quality of advertising documents
US20150046283A1 (en) Computerized sales lead management apparatus and method
CN114445128A (en) Card ticket management method and device, electronic equipment and computer readable medium
KR20130113569A (en) Online image trading system and the method thereof
CN111324780A (en) Storage method, transmission method, device and storage medium of product data
CN111815374A (en) Advertisement putting method and device, electronic equipment and medium
JP2002512404A (en) Method and system for selecting a candidate for employment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant