CN111506787B - Method, device, electronic equipment and computer readable storage medium for web page update - Google Patents

Method, device, electronic equipment and computer readable storage medium for web page update Download PDF

Info

Publication number
CN111506787B
CN111506787B CN202010153288.1A CN202010153288A CN111506787B CN 111506787 B CN111506787 B CN 111506787B CN 202010153288 A CN202010153288 A CN 202010153288A CN 111506787 B CN111506787 B CN 111506787B
Authority
CN
China
Prior art keywords
webpage
web page
identifier
cloud
updating
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202010153288.1A
Other languages
Chinese (zh)
Other versions
CN111506787A (en
Inventor
刘俊启
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Baidu Netcom Science and Technology Co Ltd
Original Assignee
Beijing Baidu Netcom Science and Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Baidu Netcom Science and Technology Co Ltd filed Critical Beijing Baidu Netcom Science and Technology Co Ltd
Priority to CN202010153288.1A priority Critical patent/CN111506787B/en
Publication of CN111506787A publication Critical patent/CN111506787A/en
Application granted granted Critical
Publication of CN111506787B publication Critical patent/CN111506787B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/955Retrieval from the web using information identifiers, e.g. uniform resource locators [URL]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/951Indexing; Web crawling techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9535Search customisation based on user profiles and personalisation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/958Organisation or management of web site content, e.g. publishing, maintaining pages or automatic linking

Abstract

The application discloses a method, a device, electronic equipment and a computer readable storage medium for web page update, and relates to the technical field of search engines. The implementation scheme adopted when the webpage is updated in the server side is as follows: after capturing a webpage, generating a cloud characteristic identifier of the webpage, and associating the cloud characteristic identifier with the webpage; after receiving feedback information sent by a client, acquiring cloud characteristic identifiers corresponding to the feedback information; when the cloud characteristic identification of the webpage is not matched with the local characteristic identification in the feedback information, replacing the original webpage by using the newly grabbed webpage, and updating the cloud characteristic identification associated with the webpage. The method and the device can improve timeliness of web page updating and effectively save computing resources of a server side.

Description

Method, device, electronic equipment and computer readable storage medium for web page update
Technical Field
The present disclosure relates to the field of internet technologies, and in particular, to a method, an apparatus, an electronic device, and a computer readable storage medium for web page update in the field of search engine technologies.
Background
With the rapid popularization of intelligent terminals, the mobile internet has become a major way for users to obtain information. Accordingly, mobile search has also become a major way for users to use search engines instead of PC search. With the development of the internet, more and more users search in the internet, and the timeliness requirement of the users on information acquisition is also higher and higher. However, in the current search engine, web page updating is generally performed only through the corresponding server side, and as the web page size increases, the computing resources required by the server side in updating the web page also increase, and if the existing computing resources of the server side are limited, the timeliness of web page updating is greatly reduced.
Disclosure of Invention
The technical scheme adopted by the application for solving the technical problem is to provide a method for updating a webpage, which comprises the following steps: after a server side captures a webpage, generating a cloud characteristic identifier of the webpage, and associating the cloud characteristic identifier with the webpage; after receiving feedback information sent by a client, acquiring cloud characteristic identifiers corresponding to the feedback information; when the cloud characteristic identification of the webpage is not matched with the local characteristic identification in the feedback information, replacing the original webpage by using the newly grabbed webpage, and updating the cloud characteristic identification associated with the webpage. The method and the device can improve timeliness of web page updating and effectively save computing resources of a server side.
According to a preferred embodiment of the present application, the generating the cloud characteristic identifier of the web page includes: determining a feature extraction rule corresponding to the webpage; and extracting the characteristics of the webpage according to the characteristic extraction rule, and generating cloud characteristic identification of the webpage by utilizing the extracted characteristics. The method and the device can improve the accuracy of webpage feature extraction.
According to a preferred embodiment of the present application, the method further comprises: receiving a rule query request sent by a client; and determining a feature extraction rule corresponding to the rule query request, and sending the determined feature extraction rule to the client. The method and the device can ensure that the client and the server use the same feature extraction rule to extract the features of the same webpage, thereby improving the accuracy of webpage updating.
According to a preferred embodiment of the present application, when determining that the cloud characteristic identifier of the web page does not match the local characteristic identifier in the feedback information, the method includes: calculating the matching degree between the cloud characteristic identifier and the local characteristic identifier; and determining whether the calculated matching degree exceeds a preset threshold, if so, determining that the two are matched, and if not, determining that the two are not matched.
According to a preferred embodiment of the present application, before replacing the original webpage with the newly grabbed webpage and updating the cloud characteristic identifier associated with the webpage, the method further includes: generating an updated feature identifier according to the re-captured webpage; determining whether the updated feature identifier is the same as the original cloud feature identifier; if not, continuing to execute the operation of replacing the original webpage by the newly captured webpage and updating the cloud characteristic identification associated with the webpage, if so, acquiring the attribute information of the client, and associating the webpage with the attribute information. The method and the device can realize the fine index of the webpage, avoid the error occurrence of the webpage updating and improve the accuracy of the webpage updating.
According to a preferred embodiment of the present application, the updating the cloud feature identifier associated with the web page includes: generating an updated feature identifier according to the re-captured webpage; and updating the cloud characteristic identifier associated with the webpage into the updated characteristic identifier. The updating accuracy of the cloud characteristic identification can be improved.
The technical scheme that this application adopted for solving technical problem provides a device of webpage update, the device is located the server side, includes: the processing unit is used for generating a cloud characteristic identifier of the webpage after capturing the webpage, and associating the cloud characteristic identifier with the webpage; the device comprises an acquisition unit, a processing unit and a processing unit, wherein the acquisition unit is used for acquiring cloud characteristic identifiers corresponding to feedback information after receiving the feedback information sent by a client; and the updating unit is used for replacing the original webpage by the newly grabbed webpage and updating the cloud characteristic identifier associated with the webpage when the cloud characteristic identifier of the webpage is not matched with the local characteristic identifier in the feedback information.
According to a preferred embodiment of the present application, when the processing unit generates the cloud feature identifier of the web page, the processing unit specifically performs: determining a feature extraction rule corresponding to the webpage; and extracting the characteristics of the webpage according to the characteristic extraction rule, and generating cloud characteristic identification of the webpage by utilizing the extracted characteristics.
According to a preferred embodiment of the present application, the processing unit further performs: receiving a rule query request sent by a client; and determining a feature extraction rule corresponding to the rule query request, and sending the determined feature extraction rule to the client.
According to a preferred embodiment of the present application, when determining that the cloud characteristic identifier of the web page does not match the local characteristic identifier in the feedback information, the updating unit specifically performs: calculating the matching degree between the cloud characteristic identifier and the local characteristic identifier; and determining whether the calculated matching degree exceeds a preset threshold, if so, determining that the two are matched, and if not, determining that the two are not matched.
According to a preferred embodiment of the present application, before replacing the original webpage with the newly captured webpage and updating the cloud characteristic identifier associated with the webpage, the updating unit further performs: generating an updated feature identifier according to the re-captured webpage; determining whether the updated feature identifier is the same as the original cloud feature identifier; if not, continuing to execute the operation of replacing the original webpage by the newly captured webpage and updating the cloud characteristic identification associated with the webpage, if so, acquiring the attribute information of the client, and associating the webpage with the attribute information.
According to a preferred embodiment of the present application, when the updating unit updates the cloud feature identifier associated with the web page, the updating unit specifically performs: generating an updated feature identifier according to the re-captured webpage; and updating the cloud characteristic identifier associated with the webpage into the updated characteristic identifier.
One embodiment of the above application has the following advantages or benefits: according to the method and the device, timeliness of webpage updating can be improved, and computing resources of a server side are effectively saved. Because the technical means that the client drives the server to update the webpage is adopted through interaction between the server and the client, the technical problem of low timeliness caused by limited computing resources when the server updates the webpage in the prior art is solved, and therefore the timeliness of webpage updating is improved, and the technical effects of saving the computing resources of the server are achieved.
Other effects of the above alternative will be described below in connection with specific embodiments.
Drawings
The drawings are for better understanding of the present solution and do not constitute a limitation of the present application. Wherein:
FIG. 1 is a flowchart of a method for web page update performed in a server side according to a first embodiment of the present application;
FIG. 2 is a block diagram of an apparatus for web page update in a server according to a second embodiment of the present application;
FIG. 3a is a block diagram of a web page crawling architecture of a prior art search engine;
FIG. 3b is a frame diagram of a search engine for web page updates according to a third embodiment of the present application;
fig. 4 is a block diagram of an electronic device for implementing a method of web page updating of an embodiment of the present application.
Detailed Description
Exemplary embodiments of the present application are described below in conjunction with the accompanying drawings, which include various details of the embodiments of the present application to facilitate understanding, and should be considered as merely exemplary. Accordingly, one of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the present application. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.
Fig. 1 is a flowchart of a method for updating a web page according to a first embodiment of the present application, where, as shown in fig. 1, the method is executed in a server, and includes:
in S101, after capturing a web page, a cloud feature identifier of the web page is generated, and the cloud feature identifier is associated with the web page.
In the step, the server firstly grabs the webpage, then generates a cloud characteristic identifier of the grabbed webpage, and then associates the generated cloud characteristic identifier with the grabbed webpage. The server side in the application is a server side of a search engine, that is, after capturing a web page, the server side of the search engine analyzes the web page to generate a feature identifier.
It can be appreciated that the server side in this step may use a web crawler to capture the web pages in the network, so as to save the captured web pages for displaying to the searching user.
Specifically, when generating the cloud feature identifier of the webpage, the following manner may be used: determining a feature extraction rule corresponding to the grabbed webpage; and extracting the characteristics of the webpage according to the determined characteristic extraction rule, and generating cloud characteristic identification of the webpage by utilizing the extracted characteristics. The method for generating the feature identifier by using the features of the web page is not limited, for example, after preset processing such as abstraction, assembly, encryption and decryption, compression and the like is performed on the features of the web page, the processing result is used as the feature identifier of the web page.
The feature extraction rule determined in this step may be a unified rule, that is, different web pages correspond to the same feature extraction rule. For example, the feature extraction rule in this step may be to extract the title, text and sub-links of the web page as the features of the web page for different web pages.
As the types of web pages are more and more abundant, when the same feature extraction rule is used to extract the features of different types of web pages, the accuracy of extracting the features of the web pages is reduced. Therefore, when determining the feature extraction rule corresponding to the web page, the following manner may be further adopted: acquiring attribute information of a webpage, wherein the acquired attribute information can comprise a webpage name, a webpage type and the like; feature extraction rules corresponding to the acquired attribute information are determined. That is, different feature extraction rules are preset for different webpages in this step, so that feature extraction is performed for different webpages by using corresponding feature extraction rules, and accuracy of webpage feature extraction is improved.
For example, for the information web page, the corresponding feature extraction rule may be to extract all the characters in the text as the web page feature, or extract several characters in the text as the web page feature; for the web pages of the portal class, the corresponding feature extraction rule can be to extract sub-links as web page features; for the shopping webpage, the corresponding extraction rule may be to extract the picture as the webpage feature.
In order to improve the accuracy of web page updating and ensure that the server and the client use the same feature extraction rule to extract features of the same web page, the steps can further comprise the following steps: receiving a rule query request sent by a client, wherein the rule query request can contain attribute information of a webpage; and determining a feature extraction rule corresponding to the received rule query request, and sending the determined feature extraction rule to the client for the client to extract features and generate a local feature identifier of the webpage. That is, the feature extraction rule can be issued according to the query request sent by the client, so that the accuracy of the local feature identification generated by the client is improved.
In addition, the client can also pre-store the feature extraction rules corresponding to the web page locally, so that the client can obtain the feature extraction rules corresponding to the web page locally without interacting with the server.
It may be understood that, in addition to the type of the feature to be extracted from the web page, the feature extraction rule issued by the server to the client or the feature extraction rule pre-stored locally in the client may also include a constraint for extracting the feature from the web page opened by the client. Under the condition that the client side meets the constraint of feature extraction, the operation of feature extraction on the web pages is executed again, wherein the constraint of feature extraction can comprise the number of times that some web pages do not need feature extraction, some web pages need feature extraction in real time, some web pages do feature extraction every day, and the like.
Specifically, in the step of associating the cloud feature identifier with the web page, the cloud feature identifier may be associated with a uniform resource locator (Uniform Resource Locator, URL) of the web page, and the URL of the web page and the associated cloud feature identifier thereof may be stored.
In S102, after receiving feedback information sent by a client, a cloud feature identifier corresponding to the feedback information is obtained.
In this step, after receiving feedback information sent by the client, a cloud feature identifier corresponding to the received feedback information is obtained, that is, a cloud feature identifier associated with a webpage currently opened by the client is obtained.
The feedback information received by the server side in the step comprises the URL of the webpage opened by the client side and the local characteristic identifier generated by the client side aiming at the webpage. In addition, the feedback information received in this step may further include attribute information of the client, for example, geographical information where the client is located, network information used by the client, and the like.
Because different webpages have respective URLs, the step can determine a unique webpage according to the URLs in the feedback information; the web page is associated with the cloud feature identifier corresponding to the web page, so that the cloud feature identifier associated with the web page can be obtained through the determined web page in the step.
That is, after a user opens a webpage in a search result through a client, the client firstly performs feature extraction according to a feature extraction rule corresponding to the webpage to generate a local feature identifier of the webpage, then generates feedback information by using the generated local feature identifier and a URL of the webpage, and finally sends the generated feedback information to a server side, so that the server side can update the webpage by comparing the feature identifiers after acquiring the cloud feature identifier.
In S103, when it is determined that the cloud characteristic identifier of the web page does not match the local characteristic identifier in the feedback information, the original web page is replaced by the newly captured web page, and the cloud characteristic identifier associated with the web page is updated.
In this step, after the cloud feature identifier of the web page is obtained in step S102, the cloud feature identifier of the web page is compared with the local feature identifier generated by the client in the feedback information to determine whether the cloud feature identifier and the local feature identifier are matched, if the cloud feature identifier and the local feature identifier are matched, the web page does not change, the web page does not need to be grabbed again, if the cloud feature identifier and the local feature identifier are not matched, the web page changes, the corresponding web page is grabbed again to replace the original web page, and the cloud feature identifier associated with the web page is updated.
That is, the server side in this step realizes that the client side drives to re-fetch the web page, and the client side sends feedback information to the server side after opening the web page, so that the timeliness of the server side for updating the web page is improved.
Specifically, when determining that the cloud characteristic identifier of the webpage is not matched with the local characteristic identifier in the feedback information, the following manner may be adopted: calculating the matching degree between the cloud characteristic identification and the local characteristic identification; and determining whether the calculated matching degree exceeds a preset threshold, if so, determining that the two are matched, and if not, determining that the two are not matched.
It can be understood that, because the local feature identifier generated by the client corresponds to the changed webpage, when the cloud feature identifier associated with the webpage is updated in this step, the cloud feature identifier associated with the webpage can be directly updated to the local feature identifier sent by the client, that is, the local feature identifier is used to replace the original cloud feature identifier.
In order to further improve the updating accuracy of the cloud characteristic identifier, the following method may be further adopted in the step of updating the cloud characteristic identifier associated with the webpage: generating an updated feature identifier according to the re-captured webpage; and updating the cloud characteristic identifier associated with the webpage into an updated characteristic identifier, namely replacing the original cloud characteristic identifier with the updated characteristic identifier. Therefore, the problem of inaccurate updating of the cloud characteristic identification caused by rapid change of the webpage can be avoided, and the accuracy of webpage updating is improved.
In some current application scenarios, in order to improve the diversity of web page display, different display modes may exist for different users on the same web page. For example, for the same web page, the presentation at site A is not the same as the presentation at site B, i.e., the web page presentation has regional differentiation.
Therefore, in order to avoid the problem of the web page update error caused by the display differentiation of the same web page and realize finer index of the web page, the following may be further included before replacing the original web page with the newly captured web page and updating the cloud feature identifier associated with the web page: generating an updated feature identifier according to the re-captured webpage; determining whether the updated feature identifier is the same as the original cloud feature identifier; if not, continuing to execute the operation of replacing the original webpage by using the newly captured webpage and updating the cloud characteristic identification associated with the webpage, if so, acquiring the attribute information of the client, and associating the webpage with the acquired attribute information, thereby outputting the directional search result to the user with the same attribute information.
For example, if the present step associates the presentation form a of a web page with the location a, associates the presentation form B of the web page with the location B, if the user at the location a opens the web page, the web page with the presentation form a is presented to the user at the location a, and if the user at the location B opens the web page, the web page with the presentation form B is presented to the user at the location B.
According to the method and the device, through the interaction between the server and the client, the client is used for driving the server to update the webpage, so that on one hand, timeliness of webpage updating can be improved, on the other hand, a basic framework of the server can be reserved, and computing resources of the server can be effectively saved.
Fig. 2 is a block diagram of a device for web page update according to a second embodiment of the present application, where, as shown in fig. 2, the device is located in a server, and includes: processing unit 201, acquisition unit 202, and update unit 203.
The processing unit 201 is configured to generate a cloud feature identifier of a web page after capturing the web page, and associate the cloud feature identifier with the web page.
The processing unit 201 first captures a web page, generates a cloud feature identifier of the captured web page, and then associates the generated cloud feature identifier with the captured web page. The server side in the application is a server side of a search engine, that is, after capturing a web page, the server side of the search engine analyzes the web page to generate a feature identifier.
It will be appreciated that the processing unit 201 may use web crawlers to capture web pages in the web, thereby saving the captured web pages for presentation to the searching user.
Specifically, when generating the cloud feature identifier of the web page, the processing unit 201 may use the following manner: determining a feature extraction rule corresponding to the grabbed webpage; and extracting the characteristics of the webpage according to the determined characteristic extraction rule, and generating cloud characteristic identification of the webpage by utilizing the extracted characteristics. The method for generating the feature identifier by using the features of the web page is not limited, for example, after preset processing such as abstraction, assembly, encryption and decryption, compression and the like is performed on the features of the web page, the processing result is used as the feature identifier of the web page.
The feature extraction rule determined by the processing unit 201 may be a unified rule, that is, different web pages correspond to the same feature extraction rule.
As the types of web pages are more and more abundant, when the same feature extraction rule is used to extract the features of different types of web pages, the accuracy of extracting the features of the web pages is reduced. Therefore, when determining the feature extraction rule corresponding to the web page, the processing unit 201 may further employ the following manner: acquiring attribute information of a webpage, wherein the acquired attribute information can comprise a webpage name, a webpage type and the like; feature extraction rules corresponding to the acquired attribute information are determined. That is, the processing unit 201 presets different feature extraction rules for different web pages, so that feature extraction is performed for different web pages by using corresponding feature extraction rules, and accuracy of web page feature extraction is improved.
In order to improve the accuracy of web page update, ensure that the server and the client use the same feature extraction rule to perform feature extraction on the same web page, the processing unit 201 may further include the following: receiving a rule query request sent by a client, wherein the rule query request can contain attribute information of a webpage; and determining a feature extraction rule corresponding to the received rule query request, and sending the determined feature extraction rule to the client for the client to extract features and generate a local feature identifier of the webpage. That is, the processing unit 201 can also issue a feature extraction rule according to the query request sent by the client, so as to improve the accuracy of generating the local feature identifier by the client.
In addition, the client can also pre-store the feature extraction rules corresponding to the web page locally, so that the client can obtain the feature extraction rules corresponding to the web page locally without interacting with the server.
It may be understood that, in addition to the type of the feature to be extracted from the web page, the feature extraction rule issued by the server to the client or the feature extraction rule pre-stored locally in the client may also include a constraint for extracting the feature from the web page opened by the client. Under the condition that the client side meets the constraint of feature extraction, the operation of feature extraction on the web pages is executed again, wherein the constraint of feature extraction can comprise the number of times that some web pages do not need feature extraction, some web pages need feature extraction in real time, some web pages do feature extraction every day, and the like.
Specifically, when associating the cloud feature identifier with the web page, the processing unit 201 may associate the cloud feature identifier with a uniform resource locator (Uniform Resource Locator, URL) of the web page, and store the URL of the web page and the associated cloud feature identifier thereof.
The obtaining unit 202 is configured to obtain a cloud feature identifier corresponding to feedback information sent by a client after receiving the feedback information.
After receiving the feedback information sent by the client, the obtaining unit 202 obtains the cloud characteristic identifier corresponding to the received feedback information, that is, obtains the cloud characteristic identifier associated with the webpage currently opened by the client.
The feedback information received by the obtaining unit 202 includes a URL of a web page opened by the client and a local feature identifier generated by the client for the web page. In addition, the feedback information received by the obtaining unit 202 may further include attribute information of the client, for example, geographic information where the client is located, network information used by the client, and the like.
Since different web pages have respective URLs, the acquiring unit 202 can determine a unique web page according to the URL in the feedback information; the web page is associated with the cloud feature identifier corresponding to the web page, so the obtaining unit 202 can obtain the cloud feature identifier associated with the web page through the determined web page.
That is, after a user opens a webpage in a search result through a client, the client firstly performs feature extraction according to a feature extraction rule corresponding to the webpage to generate a local feature identifier of the webpage, then generates feedback information by using the generated local feature identifier and a URL of the webpage, and finally sends the generated feedback information to a server side, so that the server side can update the webpage by comparing the feature identifiers after acquiring the cloud feature identifier.
And the updating unit 203 is configured to replace the original webpage with the newly captured webpage and update the cloud characteristic identifier associated with the webpage when it is determined that the cloud characteristic identifier of the webpage is not matched with the local characteristic identifier in the feedback information.
After the obtaining unit 202 obtains the cloud feature identifier of the web page, the updating unit 203 compares the cloud feature identifier of the web page with the local feature identifier generated by the client in the feedback information to determine whether the cloud feature identifier and the local feature identifier are matched, if the cloud feature identifier and the local feature identifier are matched, the web page does not change, the web page does not need to be grabbed again, if the cloud feature identifier and the local feature identifier are not matched, the web page changes, the corresponding web page is grabbed again to replace the original web page, and the cloud feature identifier associated with the web page is updated.
That is, the update unit 203 is driven by the client to re-fetch the web page, and the client sends feedback information to the server after opening the web page, so that the timeliness of the server for updating the web page is improved.
Specifically, when determining that the cloud characteristic identifier of the web page does not match the local characteristic identifier in the feedback information, the updating unit 203 may adopt the following manner: calculating the matching degree between the cloud characteristic identification and the local characteristic identification; and determining whether the calculated matching degree exceeds a preset threshold, if so, determining that the two are matched, and if not, determining that the two are not matched.
It can be understood that, because the local feature identifier generated by the client corresponds to the changed web page, when the cloud feature identifier associated with the web page is updated, the updating unit 203 may directly update the cloud feature identifier associated with the web page to the local feature identifier sent by the client, that is, replace the original cloud feature identifier with the local feature identifier.
In order to further improve the updating accuracy of the cloud feature identifier, when the updating unit 203 updates the cloud feature identifier associated with the web page, the following manner may be further adopted: generating an updated feature identifier according to the re-captured webpage; and updating the cloud characteristic identifier associated with the webpage into an updated characteristic identifier, namely replacing the original cloud characteristic identifier with the updated characteristic identifier. Therefore, the update unit 203 can avoid the problem of inaccurate update of the cloud characteristic identifier caused by rapid change of the web page, and improve the accuracy of web page update.
In some current application scenarios, in order to improve the diversity of web page display, different display modes may exist for different users on the same web page.
Therefore, in order to avoid the problem of the web page update error caused by the display differentiation of the same web page and realize finer indexing of the web page, the update unit 203 may further include the following before replacing the original web page with the newly captured web page and updating the cloud feature identifier associated with the web page: generating an updated feature identifier according to the re-captured webpage; determining whether the updated feature identifier is the same as the original cloud feature identifier; if not, continuing to execute the operation of replacing the original webpage by using the newly captured webpage and updating the cloud characteristic identification associated with the webpage, if so, acquiring the attribute information of the client, and associating the webpage with the acquired attribute information, thereby outputting the directional search result to the user with the same attribute information.
FIG. 3a is a diagram showing a search engine in the prior art for performing web page crawling, wherein a seed URL is placed in a URL to be crawling queue, DNS analysis is performed after the URL to be crawling is taken out from the queue, a web page corresponding to the URL is downloaded, and the web page is stored in a downloaded web page library; and extracting the URL from the grabbed URL queue, and placing the URL into the URL queue to be grabbed, so as to enter the next cycle.
Fig. 3b is a frame diagram of a search engine for web page update according to the third embodiment of the present application, where the process of web page crawling is the same as that described in fig. 3a, but after downloading to obtain a web page, the server side uses a processing unit, an obtaining unit and an updating unit set in the server side to analyze a web page opened by the client side, so as to drive the search engine to update the web page after determining that the web page content changes. Therefore, when the server side in the application updates the webpage, the basic framework of the search engine is reserved, and excessive modification of the search engine is not needed, so that the development difficulty is reduced.
According to embodiments of the present application, an electronic device and a computer-readable storage medium are also provided.
As shown in fig. 4, a block diagram of an electronic device according to a method of web page update according to an embodiment of the present application is shown. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. The electronic device may also represent various forms of mobile devices, such as personal digital processing, cellular telephones, smartphones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be exemplary only, and are not meant to limit implementations of the application described and/or claimed herein.
As shown in fig. 4, the electronic device includes: one or more processors 401, memory 402, and interfaces for connecting the components, including a high-speed interface and a low-speed interface. The various components are interconnected using different buses and may be mounted on a common motherboard or in other manners as desired. The processor may process instructions executing within the electronic device, including instructions stored in or on memory to display graphical information of the GUI on an external input/output device, such as a display device coupled to the interface. In other embodiments, multiple processors and/or multiple buses may be used, if desired, along with multiple memories and multiple memories. Also, multiple electronic devices may be connected, each providing a portion of the necessary operations (e.g., as a server array, a set of blade servers, or a multiprocessor system). One processor 401 is illustrated in fig. 4.
Memory 402 is a non-transitory computer-readable storage medium provided herein. Wherein the memory stores instructions executable by the at least one processor to cause the at least one processor to perform the method of web page updating provided herein. The non-transitory computer readable storage medium of the present application stores computer instructions for causing a computer to perform the method of web page updating provided herein.
The memory 402 is used as a non-transitory computer readable storage medium, and may be used to store non-transitory software programs, non-transitory computer executable programs, and modules, such as program instructions/modules (e.g., the processing unit 201, the acquiring unit 202, and the updating unit 203 shown in fig. 2) corresponding to the method for updating a web page in the embodiments of the present application. The processor 401 executes various functional applications of the server and data processing, i.e., a method of implementing web page updating in the above-described method embodiments, by running non-transitory software programs, instructions, and modules stored in the memory 402.
Memory 402 may include a storage program area that may store an operating system, at least one application program required for functionality, and a storage data area; the storage data area may store data created according to the use of the electronic device, etc. In addition, memory 402 may include high-speed random access memory, and may also include non-transitory memory, such as at least one magnetic disk storage device, flash memory device, or other non-transitory solid-state storage device. In some embodiments, memory 402 may optionally include memory remotely located with respect to processor 401, which may be connected to the electronic device of the web page update method via a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.
The electronic device of the method for updating the web page can further comprise: an input device 403 and an output device 404. The processor 401, memory 402, input device 403, and output device 404 may be connected by a bus or otherwise, for example in fig. 4.
The input device 403 may receive input numeric or character information and generate key signal inputs related to user settings and function control of the electronic device of the method of web page updating, such as input devices for a touch screen, a keypad, a mouse, a track pad, a touch pad, a pointer stick, one or more mouse buttons, a track ball, a joystick, etc. The output device 404 may include a display apparatus, auxiliary lighting devices (e.g., LEDs), and haptic feedback devices (e.g., vibration motors), among others. The display device may include, but is not limited to, a Liquid Crystal Display (LCD), a Light Emitting Diode (LED) display, and a plasma display. In some implementations, the display device may be a touch screen.
Various implementations of the systems and techniques described here can be realized in digital electronic circuitry, integrated circuitry, application specific ASIC (application specific integrated circuit), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs, the one or more computer programs may be executed and/or interpreted on a programmable system including at least one programmable processor, which may be a special purpose or general-purpose programmable processor, that may receive data and instructions from, and transmit data and instructions to, a storage system, at least one input device, and at least one output device.
These computing programs (also referred to as programs, software applications, or code) include machine instructions for a programmable processor, and may be implemented in a high-level procedural and/or object-oriented programming language, and/or in assembly/machine language. As used herein, the terms "machine-readable medium" and "computer-readable medium" refer to any computer program product, apparatus, and/or device (e.g., magnetic discs, optical disks, memory, programmable Logic Devices (PLDs)) used to provide machine instructions and/or data to a programmable processor, including a machine-readable medium that receives machine instructions as a machine-readable signal. The term "machine-readable signal" refers to any signal used to provide machine instructions and/or data to a programmable processor.
To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and pointing device (e.g., a mouse or trackball) by which a user can provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user may be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic input, speech input, or tactile input.
The systems and techniques described here can be implemented in a computing system that includes a background component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such background, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), wide Area Networks (WANs), and the internet.
The computer system may include a client and a server. The client and server are typically remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other.
According to the technical scheme of the embodiment of the application, the server side is driven to update the webpage by the client side through interaction between the server side and the client side, so that on one hand, the timeliness of webpage update can be improved, on the other hand, the basic framework of the server side can be reserved, and the computing resources of the server side can be effectively saved.
It should be appreciated that various forms of the flows shown above may be used to reorder, add, or delete steps. For example, the steps described in the present application may be performed in parallel, sequentially, or in a different order, provided that the desired results of the technical solutions disclosed in the present application can be achieved, and are not limited herein.
The above embodiments do not limit the scope of the application. It will be apparent to those skilled in the art that various modifications, combinations, sub-combinations and alternatives are possible, depending on design requirements and other factors. Any modifications, equivalent substitutions and improvements made within the spirit and principles of the present application are intended to be included within the scope of the present application.

Claims (12)

1. A method for updating a web page, comprising:
after a server side captures a webpage, generating a cloud characteristic identifier of the webpage, and associating the cloud characteristic identifier with the webpage;
after receiving feedback information sent by a client, acquiring cloud characteristic identifiers corresponding to the feedback information;
when the cloud characteristic identification of the webpage is not matched with the local characteristic identification in the feedback information, replacing the original webpage by using the newly grabbed webpage, and updating the cloud characteristic identification associated with the webpage;
before replacing the original webpage with the newly captured webpage and updating the cloud characteristic identifier associated with the webpage, the method further comprises:
generating an updated feature identifier according to the re-captured webpage;
determining whether the updated feature identifier is the same as the original cloud feature identifier;
if not, continuing to execute the operation of replacing the original webpage by the newly captured webpage and updating the cloud characteristic identification associated with the webpage, if so, acquiring the attribute information of the client, and associating the webpage with the attribute information.
2. The method of claim 1, wherein the generating the cloud characteristic identification of the web page comprises:
determining a feature extraction rule corresponding to the webpage;
and extracting the characteristics of the webpage according to the characteristic extraction rule, and generating cloud characteristic identification of the webpage by utilizing the extracted characteristics.
3. The method according to claim 1, wherein the method further comprises:
receiving a rule query request sent by a client;
and determining a feature extraction rule corresponding to the rule query request, and sending the determined feature extraction rule to the client.
4. The method of claim 1, wherein when determining that the cloud characteristic identifier of the web page does not match the local characteristic identifier in the feedback information comprises:
calculating the matching degree between the cloud characteristic identifier and the local characteristic identifier;
and determining whether the calculated matching degree exceeds a preset threshold, if so, determining that the two are matched, and if not, determining that the two are not matched.
5. The method of claim 1, wherein updating the cloud characteristic associated with the web page comprises:
generating an updated feature identifier according to the re-captured webpage;
and updating the cloud characteristic identifier associated with the webpage into the updated characteristic identifier.
6. The device for updating the web page is characterized by being positioned at a server side and comprising:
the processing unit is used for generating a cloud characteristic identifier of the webpage after capturing the webpage, and associating the cloud characteristic identifier with the webpage;
the device comprises an acquisition unit, a processing unit and a processing unit, wherein the acquisition unit is used for acquiring cloud characteristic identifiers corresponding to feedback information after receiving the feedback information sent by a client;
the updating unit is used for replacing the original webpage by the newly grabbed webpage when the cloud characteristic identification of the webpage is not matched with the local characteristic identification in the feedback information, and updating the cloud characteristic identification associated with the webpage;
the updating unit further performs, before replacing the original webpage with the newly captured webpage and updating the cloud characteristic identifier associated with the webpage:
generating an updated feature identifier according to the re-captured webpage;
determining whether the updated feature identifier is the same as the original cloud feature identifier;
if not, continuing to execute the operation of replacing the original webpage by the newly captured webpage and updating the cloud characteristic identification associated with the webpage, if so, acquiring the attribute information of the client, and associating the webpage with the attribute information.
7. The apparatus of claim 6, wherein the processing unit, when generating the cloud characteristic identifier of the web page, specifically performs:
determining a feature extraction rule corresponding to the webpage;
and extracting the characteristics of the webpage according to the characteristic extraction rule, and generating cloud characteristic identification of the webpage by utilizing the extracted characteristics.
8. The apparatus of claim 6, wherein the processing unit further performs:
receiving a rule query request sent by a client;
and determining a feature extraction rule corresponding to the rule query request, and sending the determined feature extraction rule to the client.
9. The apparatus of claim 6, wherein the updating unit, when determining that the cloud characteristic identifier of the web page does not match the local characteristic identifier in the feedback information, specifically performs:
calculating the matching degree between the cloud characteristic identifier and the local characteristic identifier;
and determining whether the calculated matching degree exceeds a preset threshold, if so, determining that the two are matched, and if not, determining that the two are not matched.
10. The apparatus of claim 6, wherein when the updating unit updates the cloud feature identifier associated with the web page, the updating unit specifically performs:
generating an updated feature identifier according to the re-captured webpage;
and updating the cloud characteristic identifier associated with the webpage into the updated characteristic identifier.
11. An electronic device, comprising:
at least one processor; and
a memory communicatively coupled to the at least one processor; wherein, the liquid crystal display device comprises a liquid crystal display device,
the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of claims 1-5.
12. A non-transitory computer readable storage medium storing computer instructions for causing the computer to perform the method of any one of claims 1-5.
CN202010153288.1A 2020-03-06 2020-03-06 Method, device, electronic equipment and computer readable storage medium for web page update Active CN111506787B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010153288.1A CN111506787B (en) 2020-03-06 2020-03-06 Method, device, electronic equipment and computer readable storage medium for web page update

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010153288.1A CN111506787B (en) 2020-03-06 2020-03-06 Method, device, electronic equipment and computer readable storage medium for web page update

Publications (2)

Publication Number Publication Date
CN111506787A CN111506787A (en) 2020-08-07
CN111506787B true CN111506787B (en) 2023-04-25

Family

ID=71863947

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010153288.1A Active CN111506787B (en) 2020-03-06 2020-03-06 Method, device, electronic equipment and computer readable storage medium for web page update

Country Status (1)

Country Link
CN (1) CN111506787B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114528005B (en) * 2021-11-29 2023-06-23 深圳市千源互联网科技服务有限公司 Grabbing label updating method, grabbing label updating device, grabbing label updating equipment and storage medium

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102109989A (en) * 2009-12-29 2011-06-29 阿里巴巴集团控股有限公司 Method, device and system for controlling browser cache
US10310699B1 (en) * 2014-12-08 2019-06-04 Amazon Technologies, Inc. Dynamic modification of browser and content presentation
CN110083616A (en) * 2019-04-19 2019-08-02 深圳前海微众银行股份有限公司 Page data processing method, device, equipment and computer readable storage medium

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7685296B2 (en) * 2003-09-25 2010-03-23 Microsoft Corporation Systems and methods for client-based web crawling
US10063617B2 (en) * 2015-09-22 2018-08-28 Facebook, Inc. Error correction using state information of data
CN106990975B (en) * 2016-01-21 2021-07-23 斑马智行网络(香港)有限公司 Application heat deployment method, device and system

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102109989A (en) * 2009-12-29 2011-06-29 阿里巴巴集团控股有限公司 Method, device and system for controlling browser cache
US10310699B1 (en) * 2014-12-08 2019-06-04 Amazon Technologies, Inc. Dynamic modification of browser and content presentation
CN110083616A (en) * 2019-04-19 2019-08-02 深圳前海微众银行股份有限公司 Page data processing method, device, equipment and computer readable storage medium

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
Web应用开发中的动态页面建模技术;严悍等;《计算机工程与应用》;20030101(第01期);全文 *

Also Published As

Publication number Publication date
CN111506787A (en) 2020-08-07

Similar Documents

Publication Publication Date Title
US11899710B2 (en) Image recognition method, electronic device and storage medium
CN111767069B (en) Applet processing method, server, device and storage medium
CN113159807B (en) Floor page processing method, floor page processing device, floor page processing equipment and floor page processing medium
CN111158799A (en) Page rendering method and device, electronic equipment and storage medium
US20210049354A1 (en) Human object recognition method, device, electronic apparatus and storage medium
CN111694857B (en) Method, device, electronic equipment and computer readable medium for storing resource data
CN111125176B (en) Service data searching method and device, electronic equipment and storage medium
CN111506787B (en) Method, device, electronic equipment and computer readable storage medium for web page update
CN111339462A (en) Component rendering method, device, server, terminal and medium
CN111767442B (en) Data updating method, device, search server, terminal and storage medium
CN111832070B (en) Data masking method, device, electronic equipment and storage medium
CN111813623B (en) Page monitoring method and device, electronic equipment and storage medium
CN110517079B (en) Data processing method and device, electronic equipment and storage medium
CN112699314A (en) Hot event determination method and device, electronic equipment and storage medium
CN111125597A (en) Webpage loading method, browser, electronic equipment and storage medium
CN111506786B (en) Method, device, electronic equipment and computer readable storage medium for web page update
CN113010811B (en) Webpage acquisition method and device, electronic equipment and computer readable storage medium
CN114661274A (en) Method and device for generating intelligent contract
CN111966432B (en) Verification code processing method and device, electronic equipment and storage medium
CN111767462B (en) Method, device, equipment and storage medium for customizing personalized rules for individual
CN110609671B (en) Sound signal enhancement method, device, electronic equipment and storage medium
CN111026438B (en) Method, device, equipment and medium for extracting small program package and page key information
CN111177558B (en) Channel service construction method and device
CN112148279A (en) Log information processing method and device, electronic equipment and storage medium
CN112052347A (en) Image storage method and device and electronic equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant