CN111506786B - Method, device, electronic equipment and computer readable storage medium for web page update - Google Patents

Method, device, electronic equipment and computer readable storage medium for web page update Download PDF

Info

Publication number
CN111506786B
CN111506786B CN202010152322.3A CN202010152322A CN111506786B CN 111506786 B CN111506786 B CN 111506786B CN 202010152322 A CN202010152322 A CN 202010152322A CN 111506786 B CN111506786 B CN 111506786B
Authority
CN
China
Prior art keywords
webpage
identifier
feature extraction
web page
cloud
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202010152322.3A
Other languages
Chinese (zh)
Other versions
CN111506786A (en
Inventor
刘俊启
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Baidu Netcom Science and Technology Co Ltd
Original Assignee
Beijing Baidu Netcom Science and Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Baidu Netcom Science and Technology Co Ltd filed Critical Beijing Baidu Netcom Science and Technology Co Ltd
Priority to CN202010152322.3A priority Critical patent/CN111506786B/en
Publication of CN111506786A publication Critical patent/CN111506786A/en
Application granted granted Critical
Publication of CN111506786B publication Critical patent/CN111506786B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/955Retrieval from the web using information identifiers, e.g. uniform resource locators [URL]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/951Indexing; Web crawling techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9535Search customisation based on user profiles and personalisation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/958Organisation or management of web site content, e.g. publishing, maintaining pages or automatic linking

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Transfer Between Computers (AREA)

Abstract

The application discloses a method, a device, electronic equipment and a computer readable storage medium for web page updating, and relates to the technical field of search engines. The application adopts the implementation scheme that when the web page is updated in the server side: after capturing a webpage, generating a cloud characteristic identifier of the webpage, and associating the cloud characteristic identifier with the webpage; and after receiving the feedback information sent by the client, re-capturing the webpage corresponding to the feedback information to replace the original webpage, and updating the cloud characteristic identifier associated with the webpage. The application adopts the implementation scheme that when the webpage is updated in the client, the implementation scheme is as follows: after the webpage is opened, generating a local feature identifier of the webpage; acquiring cloud characteristic identifiers of the webpages; and when the local feature identification is not matched with the cloud feature identification, generating feedback information and sending the feedback information to the server side. The method and the device can improve timeliness of web page updating and effectively save computing resources of a server side.

Description

Method, device, electronic equipment and computer readable storage medium for web page update
Technical Field
The present application relates to the field of internet technologies, and in particular, to a method, an apparatus, an electronic device, and a computer readable storage medium for web page update in the field of search engine technologies.
Background
With the rapid popularization of intelligent terminals, the mobile internet has become a major way for users to obtain information. Accordingly, mobile search has also become a major way for users to use search engines instead of PC search. With the development of the internet, more and more users search in the internet, and the timeliness requirement of the users on information acquisition is also higher and higher. However, in the current search engine, web page updating is generally performed only through the corresponding server side, and as the web page size increases, the computing resources required by the server side in updating the web page also increase, and if the existing computing resources of the server side are limited, the timeliness of web page updating is greatly reduced.
Disclosure of Invention
The application provides a method for updating a webpage, which is executed in a server side and comprises the following steps: after capturing a webpage, generating a cloud characteristic identifier of the webpage, and associating the cloud characteristic identifier with the webpage; and after receiving the feedback information sent by the client, re-capturing the webpage corresponding to the feedback information to replace the original webpage, and updating the cloud characteristic identification associated with the webpage. The server side is driven by the client side to update the webpage, so that timeliness of webpage update can be improved, and computing resources of the server side can be effectively saved.
According to a preferred embodiment of the present application, the generating the cloud characteristic identifier of the web page includes: determining a feature extraction rule corresponding to the webpage; and extracting the characteristics of the webpage according to the characteristic extraction rule, and generating cloud characteristic identification of the webpage by utilizing the extracted characteristics. The feature extraction accuracy can be improved through the method.
According to a preferred embodiment of the application, the method further comprises: receiving a rule query request sent by a client; and determining a feature extraction rule corresponding to the rule query request, and sending the determined feature extraction rule to the client.
According to a preferred embodiment of the application, the method further comprises: receiving an identification query request sent by a client; and determining cloud characteristic identifiers corresponding to the identifier query requests, and sending the determined cloud characteristic identifiers to the client.
According to a preferred embodiment of the present application, the updating the cloud feature identifier associated with the web page includes: generating an updated feature identifier according to the re-captured webpage; and updating the cloud characteristic identifier associated with the webpage into the updated characteristic identifier. The updating accuracy of the cloud characteristic identification can be improved.
The technical scheme adopted by the application for solving the technical problem is to provide a webpage updating method, which is executed in a client and comprises the following steps: after a webpage is opened, generating a local feature identifier of the webpage; acquiring cloud characteristic identifiers of the webpages; and when the local feature identification is not matched with the cloud feature identification, generating feedback information and sending the feedback information to a server side. According to the method and the device for updating the web page, the server side is driven to update the web page through real-time calculation of the client side, so that timeliness of web page updating is improved, and calculation resources of the server side are effectively saved.
According to a preferred embodiment of the present application, the generating the local feature identifier of the web page includes: determining a feature extraction rule corresponding to the webpage; and extracting the characteristics of the webpage according to the characteristic extraction rule, and generating a local characteristic identifier of the webpage by utilizing the extracted characteristics. The feature extraction accuracy can be improved through the method.
According to a preferred embodiment of the present application, the determining the feature extraction rule corresponding to the web page includes: acquiring attribute information of the webpage to generate a rule query request; and after the rule query request is sent to the server, receiving the feature extraction rule returned from the server.
According to a preferred embodiment of the present application, the obtaining the cloud characteristic identifier of the web page includes: acquiring a uniform resource locator of the webpage to generate an identification query request; and after the identification inquiry request is sent to the server, receiving the cloud characteristic identification returned from the server.
According to a preferred embodiment of the present application, when determining that the local feature identifier does not match the cloud feature identifier, the method includes: calculating the matching degree between the local feature identifier and the cloud feature identifier; and determining whether the matching degree exceeds a preset threshold, if so, determining that the two are matched, and otherwise, determining that the two are not matched. The step can improve the accuracy of comparison between the marks.
The application provides a device for updating web pages, which is positioned at a server end and comprises: the processing unit is used for generating a cloud characteristic identifier of the webpage after capturing the webpage, and associating the cloud characteristic identifier with the webpage; and the updating unit is used for capturing the webpage corresponding to the feedback information again to replace the original webpage after receiving the feedback information sent by the client, and updating the cloud characteristic identifier associated with the webpage.
According to a preferred embodiment of the present application, when the processing unit generates the cloud feature identifier of the web page, the processing unit specifically performs: determining a feature extraction rule corresponding to the webpage; and extracting the characteristics of the webpage according to the characteristic extraction rule, and generating cloud characteristic identification of the webpage by utilizing the extracted characteristics.
According to a preferred embodiment of the application, the processing unit is further adapted to perform: receiving a rule query request sent by a client; and determining a feature extraction rule corresponding to the rule query request, and sending the determined feature extraction rule to the client.
According to a preferred embodiment of the application, the processing unit is further adapted to perform: receiving an identification query request sent by a client; and determining cloud characteristic identifiers corresponding to the identifier query requests, and sending the determined cloud characteristic identifiers to the client.
According to a preferred embodiment of the present application, when the updating unit updates the cloud feature identifier associated with the web page, the updating unit specifically performs: generating an updated feature identifier according to the re-captured webpage; and updating the cloud characteristic identifier associated with the webpage into the updated characteristic identifier.
The application provides a device for updating web pages, which is positioned at a client and comprises: the generation unit is used for generating a local characteristic identifier of the webpage after the webpage is opened; the acquisition unit is used for acquiring the cloud characteristic identification of the webpage; and the matching unit is used for generating feedback information and sending the feedback information to the server side when the local feature identifier is not matched with the cloud feature identifier.
According to a preferred embodiment of the present application, the generating unit specifically performs, when generating the local feature identifier of the web page: determining a feature extraction rule corresponding to the webpage; and extracting the characteristics of the webpage according to the characteristic extraction rule, and generating a local characteristic identifier of the webpage by utilizing the extracted characteristics.
According to a preferred embodiment of the present application, the generating unit, when determining the feature extraction rule corresponding to the web page, specifically performs: acquiring attribute information of the webpage to generate a rule query request; and after the rule query request is sent to the server, receiving the feature extraction rule returned from the server.
According to a preferred embodiment of the present application, when the obtaining unit obtains the cloud feature identifier of the web page, the obtaining unit specifically performs: acquiring a uniform resource locator of the webpage to generate an identification query request; and after the identification inquiry request is sent to the server, receiving the cloud characteristic identification returned from the server.
According to a preferred embodiment of the present application, when the matching unit determines that the local feature identifier does not match the cloud feature identifier, the matching unit specifically performs: calculating the matching degree between the local feature identifier and the cloud feature identifier; and determining whether the matching degree exceeds a preset threshold, if so, determining that the two are matched, and otherwise, determining that the two are not matched.
One embodiment of the above application has the following advantages or benefits: the method and the device can improve timeliness of web page updating and effectively save computing resources of a server side. Because the technical means that the client side performs real-time calculation to drive the server side to update the webpage is adopted through the interaction between the server side and the client side, the technical problem of low timeliness caused by limited calculation resources when the server side updates the webpage in the prior art is solved, and the technical effects of improving the timeliness of webpage updating and saving the calculation resources of the server side are achieved.
Other effects of the above alternative will be described below in connection with specific embodiments.
Drawings
The drawings are included to provide a better understanding of the present application and are not to be construed as limiting the application. Wherein:
FIG. 1 is a flowchart of a method for web page update performed in a server side according to a first embodiment of the present application;
FIG. 2 is a flow chart of a method for web page update performed in a client according to a second embodiment of the present application;
FIG. 3 is a block diagram of an apparatus for web page update in a server according to a third embodiment of the present application;
FIG. 4 is a block diagram of an apparatus for web page update in a client according to a fourth embodiment of the present application;
fig. 5 is a block diagram of an electronic device for implementing a method of web page updating in accordance with an embodiment of the present application.
Detailed Description
Exemplary embodiments of the present application will now be described with reference to the accompanying drawings, in which various details of the embodiments of the present application are included to facilitate understanding, and are to be considered merely exemplary. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the application. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.
Fig. 1 is a flowchart of a method for updating a web page according to a first embodiment of the present application, where, as shown in fig. 1, the method is executed in a server, and includes:
In S101, after capturing a web page, a cloud feature identifier of the web page is generated, and the cloud feature identifier is associated with the web page.
In the step, the server firstly grabs the webpage, then generates a cloud characteristic identifier of the grabbed webpage, and finally associates the generated cloud characteristic identifier with the grabbed webpage. The server side in the application is the server side of the search engine, namely the server side of the search engine analyzes the webpage after capturing the webpage to generate the feature identifier.
It can be appreciated that the server side in this step may use a web crawler to capture the web pages in the network, so as to save the captured web pages for displaying to the searching user.
Specifically, when generating the cloud feature identifier of the webpage, the following manner may be used: determining a feature extraction rule corresponding to the grabbed webpage; and extracting the characteristics of the webpage according to the determined characteristic extraction rule, and generating cloud characteristic identification of the webpage by utilizing the extracted characteristics. The method for generating the feature identifier by using the features of the web page is not limited, and for example, after the web page features are subjected to operations such as abstraction, assembly, encryption and decryption, compression and the like, the processing result is used as the feature identifier.
The feature extraction rule determined in this step may be a unified rule, that is, different web pages correspond to the same feature extraction rule. For example, the feature extraction rule in this step may be to extract the title, text and sub-links of the web page as the web page features for different web pages.
However, as the types of web pages are more and more abundant, when the same feature extraction rule is used to extract the features of different types of web pages, the accuracy of extracting the features of the web pages is reduced. Therefore, when determining the feature extraction rule corresponding to the web page, the following manner may be further adopted: acquiring attribute information of a webpage, wherein the acquired attribute information can comprise a webpage name, a webpage type and the like; feature extraction rules corresponding to the acquired attribute information are determined. That is, the present step can set different feature extraction rules for different web pages, thereby improving the accuracy of web page feature extraction.
For example, for the information web page, the corresponding feature extraction rule may be to extract all the characters in the text as the web page feature, or extract several characters in the text as the web page feature; for the web pages of the portal class, the corresponding feature extraction rule can be to extract sub-links as web page features; for the shopping webpage, the corresponding extraction rule may be to extract the picture as the webpage feature.
It will be appreciated that the feature extraction rules of the present application may include constraints on feature extraction of a web page opened by a client, in addition to the type of feature to be extracted from the web page. Under the condition that the client side meets the constraint of feature extraction, the operation of feature extraction on the web pages is executed, wherein the constraint of feature extraction comprises the number of times that some web pages do not need to be subjected to feature extraction, some web pages need to be subjected to feature extraction in real time, some web pages do not need to be subjected to feature extraction every day, and the like.
In addition, in order to improve the accuracy of web page updating and ensure that the server side and the client side use the same feature extraction rule to extract the features of the same web page, the steps may further include the following: receiving a rule query request sent by a client, wherein the rule query request can contain attribute information of a webpage; and determining a feature extraction rule corresponding to the received rule query request, and sending the determined feature extraction rule to the client for the client to extract features and generate a local feature identifier of the webpage.
Specifically, in the step of associating the cloud feature identifier with the web page, the cloud feature identifier may be associated with a uniform resource locator (Uniform Resource Locator, URL) of the web page, and the URL of the web page and the associated cloud feature identifier thereof may be stored.
It can be understood that, after associating the cloud feature identifier of the webpage with the webpage, the server side in this step may send the cloud feature identifier of the webpage while sending the webpage content to the client side.
If the server side sends the webpage content to the client side and does not send the cloud characteristic identifier of the webpage, in order for the client side to realize the comparison of the characteristic identifiers, the following contents can be further included in the step: receiving an identification inquiry request sent by a client, wherein the identification inquiry request can contain the URL of a webpage; determining a cloud characteristic identifier corresponding to the received identifier query request, and sending the determined cloud characteristic identifier to the client for the client to compare the cloud characteristic identifier with the local characteristic identifier.
In S102, after receiving the feedback information sent by the client, the web page corresponding to the feedback information is grabbed again to replace the original web page, and the cloud characteristic identifier associated with the web page is updated.
In this step, after receiving feedback information sent by the client, the web page corresponding to the received feedback information is firstly grabbed again to replace the original web page, and then the cloud characteristic identifier associated with the web page is updated, so that the web page is used when the next user opens the same web page again. That is, the server in this step realizes that the client drives to re-capture the web page, and the client sends feedback information immediately after determining that the opened web page changes, so that the timeliness of the server for updating the web page is improved.
The feedback information received in the step comprises the URL of the webpage, and the URL is used for capturing the same webpage by the server side to replace the original webpage; in addition, the received feedback information can also contain a local feature identifier generated by the client, and the local feature identifier can be used for updating the cloud feature identifier.
Because the local feature identifier generated by the client corresponds to the webpage with the changed content, when the cloud feature identifier associated with the webpage is updated, the cloud feature identifier associated with the webpage can be directly updated to the local feature identifier sent by the client in this step, that is, the original cloud feature identifier is replaced by the local feature identifier.
If the received feedback information only includes URL of the web page, or in order to further improve accuracy of updating the cloud feature identifier, the following manner may be further adopted when updating the cloud feature identifier associated with the web page: generating an updated feature identifier according to the re-captured webpage; and updating the cloud characteristic identifier associated with the webpage into an updated characteristic identifier, namely replacing the original cloud characteristic identifier with the updated characteristic identifier. Therefore, the problem of inaccurate updating of the cloud characteristic identification caused by rapid change of the webpage can be avoided, and therefore accuracy of webpage updating is improved.
Fig. 2 is a flowchart of a method for updating a web page according to a second embodiment of the present application, where, as shown in fig. 2, the method is performed in a client, and includes:
in S201, after a web page is opened, a local feature identifier of the web page is generated.
In this step, the user searches through the client, and then clicks to open the web page in the search result, and then generates the local feature identifier of the opened web page.
Specifically, this step may use the following manner in generating the local feature identifier of the web page: determining a feature extraction rule corresponding to the opened webpage; and extracting the characteristics of the webpage according to the determined characteristic extraction rule, and generating a local characteristic identifier of the webpage by utilizing the extracted characteristics.
It can be understood that the client can locally pre-store the feature extraction rule corresponding to the webpage, so that the feature extraction rule can be obtained locally without interaction with the server.
If the client does not have a feature extraction rule stored in advance, the following method may be adopted in this step when determining the feature extraction rule corresponding to the opened web page: acquiring attribute information of a webpage to generate a rule query request, wherein the acquired attribute information can comprise webpage types, webpage names and the like; after sending the generated rule query request to the server, receiving the feature extraction rule returned from the server.
In S202, a cloud feature identifier of the web page is obtained.
In this step, after the local feature identifier of the web page is generated in step S201, the cloud feature identifier of the web page is obtained, so as to determine whether the web page has changed by comparing whether the feature identifiers match.
It can be understood that the server side can provide the webpage content to the client side and simultaneously send the cloud characteristic identifier of the webpage to the client side.
If the server side does not send the cloud characteristic identifier of the webpage to the client side together when providing the webpage content to the client side, the following manner may be adopted when the cloud characteristic identifier of the webpage is obtained: acquiring a uniform resource locator of a webpage to generate an identification query request; and sending the generated identification inquiry request to the server side and receiving the cloud characteristic identification returned from the server side.
In S203, when it is determined that the local feature identifier does not match the cloud feature identifier, feedback information is generated and sent to a server.
In this step, after the cloud feature identifier of the web page is obtained in step S202, whether the local feature identifier is matched with the cloud feature identifier is compared, and if the local feature identifier is not matched with the cloud feature identifier, the web page is changed, so that feedback information is generated and sent to the server side, and the server side is driven to re-capture the web page. That is, the client compares the two feature identifiers to determine whether the web page is changed, so that the client drives the server to update the web page when the web page is changed.
When generating the feedback information, the step may only acquire the URL of the web page to generate the feedback information, or may further acquire the local feature identifier of the web page to generate the feedback information.
Specifically, when the local feature identifier is determined not to match the cloud feature identifier, the following manner may be adopted: calculating the matching degree between the local feature identifier and the cloud feature identifier; and determining whether the calculated matching degree exceeds a preset threshold, if so, determining that the two are matched, and if not, determining that the two are not matched.
By utilizing the steps, the method and the system realize real-time calculation by the client through interaction between the server and the client to drive the server to update the webpage, so that on one hand, the timeliness of webpage update can be improved, on the other hand, the basic framework of the server can be reserved, and the calculation resources of the server can be effectively saved.
Fig. 3 is a block diagram of an apparatus for web page update according to a third embodiment of the present application, as shown in fig. 1, where the apparatus is located at a server side, and includes: a processing unit 301 and an updating unit 302.
The processing unit 301 is configured to generate a cloud feature identifier of a web page after capturing the web page, and associate the cloud feature identifier with the web page.
The processing unit 301 firstly captures a web page, then generates a cloud characteristic identifier of the captured web page, and finally associates the generated cloud characteristic identifier with the captured web page. The server side in the application is the server side of the search engine, namely the server side of the search engine analyzes the webpage after capturing the webpage to generate the feature identifier.
It will be appreciated that the processing unit 301 may use a web crawler to capture web pages in the web, so that the captured web pages are saved for presentation to the searching user.
Specifically, when generating the cloud feature identifier of the web page, the processing unit 301 may use the following manner: determining a feature extraction rule corresponding to the grabbed webpage; and extracting the characteristics of the webpage according to the determined characteristic extraction rule, and generating cloud characteristic identification of the webpage by utilizing the extracted characteristics.
The feature extraction rule determined by the processing unit 301 may be a unified rule, that is, different web pages correspond to the same feature extraction rule.
However, as the types of web pages are more and more abundant, when the same feature extraction rule is used to extract the features of different types of web pages, the accuracy of extracting the features of the web pages is reduced. Therefore, when determining the feature extraction rule corresponding to the web page, the processing unit 301 may further employ the following manner: acquiring attribute information of a webpage, wherein the acquired attribute information can comprise a webpage name, a webpage type and the like; feature extraction rules corresponding to the acquired attribute information are determined. That is, the processing unit 301 can set different feature extraction rules for different web pages, thereby improving the accuracy of web page feature extraction.
It will be appreciated that the feature extraction rules of the present application may include constraints on feature extraction of a web page opened by a client, in addition to the type of feature to be extracted from the web page. Under the condition that the client side meets the constraint of feature extraction, the operation of feature extraction on the web pages is executed, wherein the constraint of feature extraction comprises the number of times that some web pages do not need to be subjected to feature extraction, some web pages need to be subjected to feature extraction in real time, some web pages do not need to be subjected to feature extraction every day, and the like.
In addition, in order to improve the accuracy of web page update, to ensure that the server and the client use the same feature extraction rule to perform feature extraction of the same web page, the processing unit 301 may further include the following: receiving a rule query request sent by a client, wherein the rule query request can contain attribute information of a webpage; and determining a feature extraction rule corresponding to the received rule query request, and sending the determined feature extraction rule to the client for the client to extract features and generate a local feature identifier of the webpage.
Specifically, when associating the cloud feature identifier with the web page, the processing unit 301 may associate the cloud feature identifier with a uniform resource locator (Uniform Resource Locator, URL) of the web page, and store the URL of the web page and the associated cloud feature identifier thereof.
It may be appreciated that, after associating the cloud feature identifier of the web page with the URL of the web page, the processing unit 301 may send the cloud feature identifier of the web page while sending the web page content to the client.
If the server side sends the web content to the client side and does not send the cloud feature identifier of the web page, in order for the client side to be able to implement the comparison of the feature identifiers, the processing unit 301 may further include the following contents: receiving an identification inquiry request sent by a client, wherein the identification inquiry request can contain the URL of a webpage; determining a cloud characteristic identifier corresponding to the received identifier query request, and sending the determined cloud characteristic identifier to the client for the client to compare the cloud characteristic identifier with the local characteristic identifier.
And the updating unit 302 is configured to, after receiving feedback information sent by the client, re-capture a webpage corresponding to the feedback information to replace an original webpage, and update a cloud characteristic identifier associated with the webpage.
After receiving the feedback information sent by the client, the updating unit 302 first re-captures the webpage corresponding to the received feedback information to replace the original webpage, and then updates the cloud characteristic identifier associated with the webpage, so that the webpage is used when the next user opens the same webpage again. That is, the update unit 302 is driven by the client to re-capture the web page, and the client sends feedback information immediately after determining that the opened web page changes, so that timeliness of web page update is improved.
The feedback information received by the updating unit 302 includes a URL of a web page, where the URL is used for the server to capture the same web page to replace the original web page; in addition, the received feedback information can also contain a local feature identifier generated by the client, and the local feature identifier can be used for updating the cloud feature identifier.
Because the local feature identifier generated by the client corresponds to the web page with the changed content, when the cloud feature identifier associated with the web page is updated, the updating unit 302 may directly update the cloud feature identifier associated with the web page to the local feature identifier sent by the client, that is, replace the original cloud feature identifier with the local feature identifier.
If the received feedback information only includes the URL of the web page, or in order to further improve the update accuracy of the cloud feature identifier, the update unit 302 may further use the following manner when updating the cloud feature identifier associated with the web page: generating an updated feature identifier according to the re-captured webpage; and updating the cloud characteristic identifier associated with the webpage into an updated characteristic identifier, namely replacing the original cloud characteristic identifier with the updated characteristic identifier. Therefore, the update unit 302 can avoid the problem of inaccurate update of the cloud characteristic identifier caused by rapid change of the web page, thereby improving the accuracy of web page update.
Fig. 4 is a block diagram of an apparatus for web page update according to a fourth embodiment of the present application, where, as shown in fig. 4, the apparatus is located at a client, and includes: a generating unit 401, an acquiring unit 402, and a matching unit 403.
The generating unit 401 is configured to generate a local feature identifier of a web page after opening the web page.
The generation unit 401 generates a local feature identifier of the opened web page after the user retrieves through the client and opens the web page in the retrieval result by clicking.
Specifically, the generation unit 401 may use the following manner when generating the local feature identifier of the web page: determining a feature extraction rule corresponding to the opened webpage; and extracting the characteristics of the webpage according to the determined characteristic extraction rule, and generating a local characteristic identifier of the webpage by utilizing the extracted characteristics.
It may be appreciated that the client may store the feature extraction rule corresponding to the web page in advance locally, so the generating unit 401 may obtain the feature extraction rule locally without interacting with the server.
If the client does not have a feature extraction rule stored in advance, the generating unit 401 may use the following method when determining the feature extraction rule corresponding to the opened web page: acquiring attribute information of a webpage to generate a rule query request, wherein the acquired attribute information can comprise webpage types, webpage names and the like; after sending the generated rule query request to the server, receiving the feature extraction rule returned from the server.
The obtaining unit 402 is configured to obtain a cloud feature identifier of the web page.
After the generation unit 401 generates the local feature identifier of the web page, the acquisition unit 402 acquires the cloud feature identifier of the web page, thereby determining whether the web page has changed by comparing whether the feature identifiers match.
It can be understood that the server side can provide the webpage content to the client side and simultaneously send the cloud characteristic identifier of the webpage to the client side.
If the server side does not send the cloud feature identifier of the web page to the client side together when providing the web page content to the client side, the obtaining unit 402 may use the following manner when obtaining the cloud feature identifier of the web page: acquiring a uniform resource locator of a webpage to generate an identification query request; and sending the generated identification inquiry request to the server side and receiving the cloud characteristic identification returned from the server side.
And the matching unit 403 is configured to generate feedback information and send the feedback information to a server when the local feature identifier is determined to not match the cloud feature identifier.
After the obtaining unit 402 obtains the cloud feature identifier of the web page, the matching unit 403 compares whether the local feature identifier and the cloud feature identifier are matched, if the local feature identifier and the cloud feature identifier are not matched, the web page is changed, so that feedback information is generated and sent to the server side, and the server side is driven to re-capture the web page. That is, the matching unit 403 determines whether the web page has changed by comparing the two feature identifiers by the client, so that in the case of the change, the client drives the server side to update the web page.
When generating the feedback information, the matching unit 403 may acquire only the URL of the web page to generate the feedback information, or may further acquire the local feature identifier of the web page to generate the feedback information.
Specifically, when determining that the local feature identifier does not match the cloud feature identifier, the matching unit 403 may employ the following manner: calculating the matching degree between the local feature identifier and the cloud feature identifier; and determining whether the calculated matching degree exceeds a preset threshold, if so, determining that the two are matched, and if not, determining that the two are not matched.
According to an embodiment of the present application, the present application also provides an electronic device and a computer-readable storage medium.
As shown in fig. 5, a block diagram of an electronic device of a method for web page update according to an embodiment of the present application is shown. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. The electronic device may also represent various forms of mobile devices, such as personal digital processing, cellular telephones, smartphones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be exemplary only, and are not meant to limit implementations of the applications described and/or claimed herein.
As shown in fig. 5, the electronic device includes: one or more processors 501, memory 502, and interfaces for connecting components, including high-speed interfaces and low-speed interfaces. The various components are interconnected using different buses and may be mounted on a common motherboard or in other manners as desired. The processor may process instructions executing within the electronic device, including instructions stored in or on memory to display graphical information of the GUI on an external input/output device, such as a display device coupled to the interface. In other embodiments, multiple processors and/or multiple buses may be used, if desired, along with multiple memories and multiple memories. Also, multiple electronic devices may be connected, each providing a portion of the necessary operations (e.g., as a server array, a set of blade servers, or a multiprocessor system). One processor 501 is illustrated in fig. 5.
Memory 502 is a non-transitory computer readable storage medium provided by the present application. The memory stores instructions executable by the at least one processor to cause the at least one processor to perform the method for web page updating provided by the present application. The non-transitory computer readable storage medium of the present application stores computer instructions for causing a computer to perform the method of web page updating provided by the present application.
The memory 502 is used as a non-transitory computer readable storage medium, and may be used to store a non-transitory software program, a non-transitory computer executable program, and modules, such as program instructions/modules corresponding to the method for updating a web page in the embodiment of the present application (e.g., the processing unit 301 and the updating unit 302 shown in fig. 3, the generating unit 401, the acquiring unit 402, and the matching unit 403 shown in fig. 4). The processor 501 executes various functional applications of the server and data processing, that is, a method of implementing web page updating in the above-described method embodiment, by running non-transitory software programs, instructions, and modules stored in the memory 502.
Memory 502 may include a storage program area that may store an operating system, at least one application program required for functionality, and a storage data area; the storage data area may store data created according to the use of the electronic device, etc. In addition, memory 502 may include high-speed random access memory, and may also include non-transitory memory, such as at least one magnetic disk storage device, flash memory device, or other non-transitory solid-state storage device. In some embodiments, memory 502 may optionally include memory located remotely from processor 501, which may be connected to the electronic device of the web page update method via a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.
The electronic device of the method for updating the web page can further comprise: an input device 503 and an output device 504. The processor 501, memory 502, input devices 503 and output devices 504 may be connected by a bus or otherwise, for example in fig. 5.
The input device 503 may receive input numeric or character information and generate key signal inputs related to user settings and function control of the electronic device of the method of web page updating, such as a touch screen, a keypad, a mouse, a track pad, a touch pad, a pointer stick, one or more mouse buttons, a track ball, a joystick, and the like. The output devices 504 may include a display device, auxiliary lighting devices (e.g., LEDs), and haptic feedback devices (e.g., vibration motors), among others. The display device may include, but is not limited to, a Liquid Crystal Display (LCD), a Light Emitting Diode (LED) display, and a plasma display. In some implementations, the display device may be a touch screen.
Various implementations of the systems and techniques described here can be realized in digital electronic circuitry, integrated circuitry, application specific ASIC (application specific integrated circuit), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs, the one or more computer programs may be executed and/or interpreted on a programmable system including at least one programmable processor, which may be a special purpose or general-purpose programmable processor, that may receive data and instructions from, and transmit data and instructions to, a storage system, at least one input device, and at least one output device.
These computing programs (also referred to as programs, software applications, or code) include machine instructions for a programmable processor, and may be implemented in a high-level procedural and/or object-oriented programming language, and/or in assembly/machine language. As used herein, the terms "machine-readable medium" and "computer-readable medium" refer to any computer program product, apparatus, and/or device (e.g., magnetic discs, optical disks, memory, programmable Logic Devices (PLDs)) used to provide machine instructions and/or data to a programmable processor, including a machine-readable medium that receives machine instructions as a machine-readable signal. The term "machine-readable signal" refers to any signal used to provide machine instructions and/or data to a programmable processor.
To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and pointing device (e.g., a mouse or trackball) by which a user can provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user may be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic input, speech input, or tactile input.
The systems and techniques described here can be implemented in a computing system that includes a background component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such background, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), wide Area Networks (WANs), and the internet.
The computer system may include a client and a server. The client and server are typically remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other.
According to the technical scheme of the embodiment of the application, through interaction between the server side and the client side, real-time calculation is carried out by the client side to drive the server side to update the webpage, so that on one hand, timeliness of webpage update can be improved, on the other hand, a basic framework of the server side can be reserved, and computing resources of the server side can be effectively saved.
It should be appreciated that various forms of the flows shown above may be used to reorder, add, or delete steps. For example, the steps described in the present application may be performed in parallel, sequentially, or in a different order, so long as the desired results of the technical solution disclosed in the present application can be achieved, and are not limited herein.
The above embodiments do not limit the scope of the present application. It will be apparent to those skilled in the art that various modifications, combinations, sub-combinations and alternatives are possible, depending on design requirements and other factors. Any modifications, equivalent substitutions and improvements made within the spirit and principles of the present application should be included in the scope of the present application.

Claims (18)

1. A method for updating a web page, the method being executed in a server side and comprising:
after capturing a webpage, generating a cloud characteristic identifier of the webpage, and associating the cloud characteristic identifier with the webpage;
after receiving feedback information sent by a client, re-capturing a webpage corresponding to the feedback information to replace an original webpage, and updating cloud characteristic identifiers associated with the webpage;
The generating the cloud characteristic identifier of the webpage comprises the following steps:
determining a feature extraction rule corresponding to the webpage according to the attribute information of the webpage, wherein the feature extraction rule comprises a feature extraction type and feature extraction constraint;
extracting the characteristics of the webpage according to the characteristic extraction type, and generating a cloud characteristic identifier of the webpage by utilizing the extracted characteristics;
and the feature extraction constraint is used for extracting the features of the webpage according to the feature extraction type under the condition that the client determines that the webpage meets the feature extraction constraint.
2. The method according to claim 1, wherein the method further comprises:
receiving a rule query request sent by a client;
and determining a feature extraction rule corresponding to the rule query request, and sending the determined feature extraction rule to the client.
3. The method according to claim 1, wherein the method further comprises:
receiving an identification query request sent by a client;
and determining cloud characteristic identifiers corresponding to the identifier query requests, and sending the determined cloud characteristic identifiers to the client.
4. The method of claim 1, wherein updating the cloud characteristic associated with the web page comprises:
Generating an updated feature identifier according to the re-captured webpage;
and updating the cloud characteristic identifier associated with the webpage into the updated characteristic identifier.
5. A method of web page updating, the method performed in a client, comprising:
after a webpage is opened, generating a local feature identifier of the webpage;
acquiring a cloud characteristic identifier of the webpage, wherein the cloud characteristic identifier is generated by a server side;
when the local characteristic identifier is not matched with the cloud characteristic identifier, generating feedback information and sending the feedback information to a server side, wherein the feedback information is used for updating the webpage and the cloud characteristic identifier of the webpage by the server side;
the generating the local feature identification of the web page comprises:
determining a feature extraction rule corresponding to the webpage according to the attribute information of the webpage, wherein the feature extraction rule comprises a feature extraction type and feature extraction constraint;
and under the condition that the webpage meets the feature extraction constraint, extracting the features of the webpage according to the feature extraction type, and generating a local feature identifier of the webpage by utilizing the extracted features.
6. The method of claim 5, wherein determining feature extraction rules corresponding to the web page based on attribute information of the web page comprises:
Acquiring attribute information of the webpage to generate a rule query request;
and after the rule query request is sent to the server, receiving the feature extraction rule returned from the server.
7. The method of claim 5, wherein the obtaining the cloud characteristic identification of the web page comprises:
acquiring a uniform resource locator of the webpage to generate an identification query request;
and after the identification inquiry request is sent to the server, receiving the cloud characteristic identification returned from the server.
8. The method of claim 5, wherein upon determining that the local signature does not match the cloud signature comprises:
calculating the matching degree between the local feature identifier and the cloud feature identifier;
and determining whether the matching degree exceeds a preset threshold, if so, determining that the two are matched, and otherwise, determining that the two are not matched.
9. A device for updating a web page, the device being located at a server side and comprising:
the processing unit is used for generating a cloud characteristic identifier of the webpage after capturing the webpage, and associating the cloud characteristic identifier with the webpage;
The updating unit is used for capturing the webpage corresponding to the feedback information again to replace the original webpage after receiving the feedback information sent by the client, and updating the cloud characteristic identifier associated with the webpage;
the processing unit specifically executes when generating the cloud characteristic identifier of the webpage:
determining a feature extraction rule corresponding to the webpage according to the attribute information of the webpage, wherein the feature extraction rule comprises a feature extraction type and feature extraction constraint;
extracting the characteristics of the webpage according to the characteristic extraction type, and generating a cloud characteristic identifier of the webpage by utilizing the extracted characteristics;
and the feature extraction constraint is used for extracting the features of the webpage according to the feature extraction type under the condition that the client determines that the webpage meets the feature extraction constraint.
10. The apparatus of claim 9, wherein the processing unit is further configured to perform:
receiving a rule query request sent by a client;
and determining a feature extraction rule corresponding to the rule query request, and sending the determined feature extraction rule to the client.
11. The apparatus of claim 9, wherein the processing unit is further configured to perform:
Receiving an identification query request sent by a client;
and determining cloud characteristic identifiers corresponding to the identifier query requests, and sending the determined cloud characteristic identifiers to the client.
12. The apparatus of claim 9, wherein the updating unit, when updating the cloud characteristic identifier associated with the web page, specifically performs:
generating an updated feature identifier according to the re-captured webpage;
and updating the cloud characteristic identifier associated with the webpage into the updated characteristic identifier.
13. An apparatus for web page updating, wherein the apparatus is located at a client, and comprises:
the generation unit is used for generating a local characteristic identifier of the webpage after the webpage is opened;
the acquisition unit is used for acquiring cloud characteristic identifiers of the webpages, wherein the cloud characteristic identifiers are generated by a server side;
the matching unit is used for generating feedback information and sending the feedback information to the server side when the local feature identification is not matched with the cloud feature identification, wherein the feedback information is used for updating the webpage and the cloud feature identification of the webpage by the server side;
the generation unit specifically performs when generating the local feature identifier of the webpage:
Determining a feature extraction rule corresponding to the webpage according to the attribute information of the webpage, wherein the feature extraction rule comprises a feature extraction type and feature extraction constraint;
and under the condition that the webpage meets the feature extraction constraint, extracting the features of the webpage according to the feature extraction type, and generating a local feature identifier of the webpage by utilizing the extracted features.
14. The apparatus according to claim 13, wherein the generating unit, when determining the feature extraction rule corresponding to the web page based on the attribute information of the web page, specifically performs:
acquiring attribute information of the webpage to generate a rule query request;
and after the rule query request is sent to the server, receiving the feature extraction rule returned from the server.
15. The apparatus of claim 13, wherein the obtaining unit, when obtaining the cloud feature identifier of the web page, specifically performs:
acquiring a uniform resource locator of the webpage to generate an identification query request;
and after the identification inquiry request is sent to the server, receiving the cloud characteristic identification returned from the server.
16. The apparatus of claim 13, wherein the matching unit, when determining that the local feature identifier does not match the cloud feature identifier, specifically performs:
calculating the matching degree between the local feature identifier and the cloud feature identifier;
and determining whether the matching degree exceeds a preset threshold, if so, determining that the two are matched, and otherwise, determining that the two are not matched.
17. An electronic device, comprising:
at least one processor; and
a memory communicatively coupled to the at least one processor; wherein,,
the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of claims 1-8.
18. A non-transitory computer readable storage medium storing computer instructions for causing the computer to perform the method of any one of claims 1-8.
CN202010152322.3A 2020-03-06 2020-03-06 Method, device, electronic equipment and computer readable storage medium for web page update Active CN111506786B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010152322.3A CN111506786B (en) 2020-03-06 2020-03-06 Method, device, electronic equipment and computer readable storage medium for web page update

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010152322.3A CN111506786B (en) 2020-03-06 2020-03-06 Method, device, electronic equipment and computer readable storage medium for web page update

Publications (2)

Publication Number Publication Date
CN111506786A CN111506786A (en) 2020-08-07
CN111506786B true CN111506786B (en) 2023-10-27

Family

ID=71868985

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010152322.3A Active CN111506786B (en) 2020-03-06 2020-03-06 Method, device, electronic equipment and computer readable storage medium for web page update

Country Status (1)

Country Link
CN (1) CN111506786B (en)

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102109989A (en) * 2009-12-29 2011-06-29 阿里巴巴集团控股有限公司 Method, device and system for controlling browser cache
CN103631975A (en) * 2013-12-26 2014-03-12 成都科来软件有限公司 Data extraction method and device
CN104065635A (en) * 2013-07-02 2014-09-24 腾讯科技(深圳)有限公司 Web page accessing method and client
CN105095226A (en) * 2014-04-25 2015-11-25 广州市动景计算机科技有限公司 Method and apparatus for loading webpage resource
US10310699B1 (en) * 2014-12-08 2019-06-04 Amazon Technologies, Inc. Dynamic modification of browser and content presentation
CN110083616A (en) * 2019-04-19 2019-08-02 深圳前海微众银行股份有限公司 Page data processing method, device, equipment and computer readable storage medium

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7685296B2 (en) * 2003-09-25 2010-03-23 Microsoft Corporation Systems and methods for client-based web crawling
US8601093B2 (en) * 2010-02-10 2013-12-03 DSNR Media Group Ltd. Method and system for generation, adjustment and utilization of web pages selection rules
US9870349B2 (en) * 2013-09-20 2018-01-16 Yottaa Inc. Systems and methods for managing loading priority or sequencing of fragments of a web object
US10063617B2 (en) * 2015-09-22 2018-08-28 Facebook, Inc. Error correction using state information of data
CN106990975B (en) * 2016-01-21 2021-07-23 斑马智行网络(香港)有限公司 Application heat deployment method, device and system

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102109989A (en) * 2009-12-29 2011-06-29 阿里巴巴集团控股有限公司 Method, device and system for controlling browser cache
CN104065635A (en) * 2013-07-02 2014-09-24 腾讯科技(深圳)有限公司 Web page accessing method and client
CN103631975A (en) * 2013-12-26 2014-03-12 成都科来软件有限公司 Data extraction method and device
CN105095226A (en) * 2014-04-25 2015-11-25 广州市动景计算机科技有限公司 Method and apparatus for loading webpage resource
US10310699B1 (en) * 2014-12-08 2019-06-04 Amazon Technologies, Inc. Dynamic modification of browser and content presentation
CN110083616A (en) * 2019-04-19 2019-08-02 深圳前海微众银行股份有限公司 Page data processing method, device, equipment and computer readable storage medium

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
Web应用开发中的动态页面建模技术;严悍等;《计算机工程与应用》;20030101(第01期);全文 *

Also Published As

Publication number Publication date
CN111506786A (en) 2020-08-07

Similar Documents

Publication Publication Date Title
US20210049354A1 (en) Human object recognition method, device, electronic apparatus and storage medium
CN113159807B (en) Floor page processing method, floor page processing device, floor page processing equipment and floor page processing medium
CN111460289B (en) News information pushing method and device
CN110990057B (en) Method, device, equipment and medium for extracting small program subchain information
CN112015468B (en) Interface document processing method and device, electronic equipment and storage medium
CN111125176B (en) Service data searching method and device, electronic equipment and storage medium
CN110532404B (en) Source multimedia determining method, device, equipment and storage medium
CN111967304A (en) Method and device for acquiring article information based on edge calculation and settlement table
CN111767442B (en) Data updating method, device, search server, terminal and storage medium
CN111506787B (en) Method, device, electronic equipment and computer readable storage medium for web page update
CN112699314A (en) Hot event determination method and device, electronic equipment and storage medium
CN110517079B (en) Data processing method and device, electronic equipment and storage medium
CN111310044B (en) Page element information extraction method, device, equipment and storage medium
CN112733009B (en) Searching method and device
CN111506786B (en) Method, device, electronic equipment and computer readable storage medium for web page update
CN113010811B (en) Webpage acquisition method and device, electronic equipment and computer readable storage medium
CN111966432B (en) Verification code processing method and device, electronic equipment and storage medium
CN112446728B (en) Advertisement recall method, device, equipment and storage medium
US20210248486A1 (en) Method, apparatus, device and storage medium for customizing personalized rules for entities
CN111026438B (en) Method, device, equipment and medium for extracting small program package and page key information
CN112445968B (en) Information pushing method, device, equipment and computer readable storage medium
CN113220982A (en) Advertisement searching method, device, electronic equipment and medium
CN111552878B (en) Data processing method and device
CN112800319A (en) Information searching method, device, equipment and medium
CN112052347B (en) Image storage method and device and electronic equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant