CN109299352B - Method and device for updating website data in search engine and search engine - Google Patents

Method and device for updating website data in search engine and search engine Download PDF

Info

Publication number
CN109299352B
CN109299352B CN201811350507.4A CN201811350507A CN109299352B CN 109299352 B CN109299352 B CN 109299352B CN 201811350507 A CN201811350507 A CN 201811350507A CN 109299352 B CN109299352 B CN 109299352B
Authority
CN
China
Prior art keywords
structured data
website
data
target
search engine
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201811350507.4A
Other languages
Chinese (zh)
Other versions
CN109299352A (en
Inventor
张安站
徐中杰
刘伟
郝洪霆
刘桐仁
滕岩松
朱月俊
强伟
陈正亮
王鹏
李立
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Baidu Online Network Technology Beijing Co Ltd
Original Assignee
Baidu Online Network Technology Beijing Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Baidu Online Network Technology Beijing Co Ltd filed Critical Baidu Online Network Technology Beijing Co Ltd
Priority to CN201811350507.4A priority Critical patent/CN109299352B/en
Publication of CN109299352A publication Critical patent/CN109299352A/en
Application granted granted Critical
Publication of CN109299352B publication Critical patent/CN109299352B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Abstract

The application discloses a method and a device for updating website data in a search engine and the search engine, wherein the method comprises the following steps: receiving the structured data of the target website submitted by the website leader this time, storing the structured data of the target website in a distributed storage system in a file form, supporting the introduction of massive data of the website, aiming at the target website, comparing the structured data of the target website submitted this time with the structured data submitted last time, and updating the structured data of the target website in the webpage database according to the comparison result. Therefore, the efficiency of introducing the structured data of the website into the search engine is improved, the efficiency of recording the updated content of the website in the search engine is improved, and the accuracy and the timeliness of the search result can be improved when the subsequent user searches through the search engine.

Description

Method and device for updating website data in search engine and search engine
Technical Field
The present application relates to the field of internet technologies, and in particular, to a method and an apparatus for updating website data in a search engine, and a search engine.
Background
The structured resource refers to high-quality data which is provided by a website for a search engine and can be directly recorded, indexed and displayed by the search engine, and the function of searching immediately is realized. After the structured resources enter a webpage library of a search engine, the search engine has stronger control force and better quality guarantee, and can provide high-quality search experience for users. Therefore, a large amount of site total station data is directly submitted to a search engine by means of structured resources.
In the related art, when the content of the web page corresponding to the website is updated, the content of the web page of the website provided by the search engine is also updated synchronously. Generally, the website owner submits the data of the website addition, deletion and/or modification, and then, the data of the website addition, deletion and/or modification submitted by the website owner is stored through a relational database. However, when a website with a large data size is used, the website administrator needs to perform data introduction preprocessing by himself, the efficiency is low, and the data added, deleted and/or changed from the website is stored in the relational database, so that the database is under excessive pressure and cannot be introduced in time, and thus deleted data in a corresponding webpage in a search engine cannot be deleted in time, newly added data cannot enter the search engine in time, a user cannot use new content of the website in time, and retrieval timeliness is problematic.
Disclosure of Invention
The present application is directed to solving, at least to some extent, one of the technical problems in the related art.
Therefore, a first objective of the present application is to provide a method for updating website data in a search engine, which improves the efficiency of introducing structured data of a website into the search engine, improves the efficiency of recording updated contents of the website in the search engine, and further can improve the accuracy and timeliness of search results when a subsequent user searches through the search engine.
A second object of the present application is to provide an apparatus for updating website data in a search engine.
A third object of the present application is to propose a search engine.
A fourth object of the present application is to propose a non-transitory computer-readable storage medium.
A fifth object of the present application is to propose a computer program product.
In order to achieve the above object, an embodiment of a first aspect of the present application provides a method for updating website data in a search engine, including: the search engine comprises a distributed storage system and a webpage database, and the method comprises the following steps:
receiving first structured data of a target website submitted by a website leader this time; writing the first structured data into a target file, storing the target file in the distributed storage system, and storing a first storage position of the first structured data; when the first structured data are monitored to be stored completely, aiming at the target website, acquiring the first structured data from the target file of the distributed storage system according to the first storage position; acquiring second structured data corresponding to the target website in the webpage database, wherein the second structured data is the structured data submitted by the website captain last time; and comparing the first structured data with the second structured data, and updating the second structured data of the target website in the webpage database according to the comparison result.
The method for updating the website data in the search engine in the embodiment of the application receives the structured data of the target website submitted by the website owner this time, stores the structured data of the target website in a distributed storage system in a file form, supports the introduction of massive data of the website, compares the structured data of the target website submitted this time with the structured data submitted last time aiming at the target website, and updates the structured data of the target website in the webpage database according to the comparison result. Therefore, the efficiency of introducing the structured data of the website into the search engine is improved, the efficiency of recording the updated content of the website in the search engine is improved, and the accuracy and the timeliness of the search result can be improved when the subsequent user searches through the search engine.
In order to achieve the above object, a second aspect of the present application provides an apparatus for updating website data in a search engine, including: the search engine comprises a distributed storage system and a webpage database, and the device comprises: the receiving module is used for receiving first structured data of a target website submitted by a website leader this time; the storage control module is used for writing the first structured data into a target file, storing the target file into the distributed storage system, and storing a first storage position of the first structured data; a first obtaining module, configured to, when it is monitored that the storage of the first structured data is completed, obtain, for the target website, the first structured data from the target file of the distributed storage system according to the first storage location; a second obtaining module, configured to obtain second structured data corresponding to the target website in the web database, where the second structured data is structured data submitted by the website owner last time; and the updating module is used for comparing the first structured data with the second structured data and updating the second structured data of the target website in the webpage database according to a comparison result.
The updating device for the website data in the search engine in the embodiment of the application receives the structured data of the target website submitted by the website owner this time, stores the structured data of the target website in a distributed storage system in a file form, supports the introduction of massive data of the website, compares the structured data of the target website submitted this time with the structured data submitted last time aiming at the target website, and updates the structured data of the target website in the webpage database according to the comparison result. Therefore, the efficiency of introducing the structured data of the website into the search engine is improved, the efficiency of recording the updated content of the website in the search engine is improved, and the accuracy and the timeliness of the search result can be improved when the subsequent user searches through the search engine.
To achieve the above object, a third aspect of the present application provides a search engine, including: a processor and a memory; wherein, the processor executes a program corresponding to the executable program code by reading the executable program code stored in the memory, so as to implement the method for updating website data in a search engine as described in the above embodiments.
In order to achieve the above object, a fourth aspect of the present application provides a non-transitory computer-readable storage medium, where the program is executed by a processor to implement the method for updating website data in a search engine as described in the above embodiments.
In order to achieve the above object, a fifth aspect of the present application provides a computer program product, where when executed by an instruction processor of the computer program product, the method for updating website data in a search engine as described in the above embodiments is performed.
Additional aspects and advantages of the present application will be set forth in part in the description which follows and, in part, will be obvious from the description, or may be learned by practice of the present application.
Drawings
The foregoing and/or additional aspects and advantages of the present application will become apparent and readily appreciated from the following description of the embodiments, taken in conjunction with the accompanying drawings of which:
FIG. 1 is a flow diagram of a method for updating website data in a search engine according to one embodiment of the present application;
FIG. 2 is a flow diagram of a method for updating website data in a search engine according to another embodiment of the present application;
FIG. 3 is an exemplary diagram of a process for the import and storage of data according to one embodiment of the present application;
FIG. 4 is a diagram illustrating an example of a specific process of data comparison according to an embodiment of the present application;
FIG. 5 is a schematic structural diagram of an apparatus for updating website data in a search engine according to an embodiment of the present application;
FIG. 6 is a schematic structural diagram of an apparatus for updating website data in a search engine according to another embodiment of the present application;
FIG. 7 is a schematic structural diagram of an apparatus for updating website data in a search engine according to another embodiment of the present application;
fig. 8 is a schematic hardware structure diagram of a search engine for executing a method for updating website data in the search engine according to an embodiment of the present application.
Detailed Description
Reference will now be made in detail to embodiments of the present application, examples of which are illustrated in the accompanying drawings, wherein like or similar reference numerals refer to the same or similar elements or elements having the same or similar function throughout. The embodiments described below with reference to the drawings are exemplary and intended to be used for explaining the present application and should not be construed as limiting the present application.
The following describes a method, an apparatus, and a search engine for updating website data in a search engine according to an embodiment of the present application with reference to the drawings.
Fig. 1 is a flowchart of a method for updating website data in a search engine according to an embodiment of the present application.
As shown in fig. 1, the method may include:
step 101, receiving first structured data of a target website submitted by a website leader this time.
It should be understood that the method for updating website data in a search engine of the present embodiment is applied to an updating apparatus for website data in a search engine, and the updating apparatus is located in the search engine.
It should be noted that the search engine of this embodiment may include, but is not limited to, a distributed storage system and a web page database.
The distributed storage system is used for storing the structured data of the target website in a file form.
The data resources in the target web page corresponding to the target website may include, but are not limited to, resources such as novel, pictures, audio, and video, which is not limited in this embodiment.
It should be understood that the target website may be a website based on a structured data frame, or may be a website not based on a structured data frame.
It should be understood that, in order to provide structured data to a search engine, as an exemplary embodiment, when it is determined that a data resource of web content corresponding to a target website includes an unstructured resource, the data resource of the target website may be structured by a structured data plug-in program in a background of the website.
The first structured data may include, but is not limited to, an entity name and corresponding entity attribute information.
Taking the target website as a music website based on a structured data frame as an example, for each song of the music website, the structured data of the current song is a song name of the current song, and the entity attribute information of the current song may include, but is not limited to, copyright information, description information (such as singer name), lyric information, a download address, a cover picture of the current song, and the like. Wherein it is to be understood that a song corresponds to a piece of structured data.
It should be understood that, when the first structured data of the target website is submitted, the website data of the target website may be submitted to the search engine in a format file such as XML, JSON, and the like. Correspondingly, after receiving the website data of the target website, the search engine analyzes the corresponding file according to the corresponding analysis rule to obtain the first structured data corresponding to the target website.
For example, a web site owner submits structured data of a target web site to a search engine in a JSON formatted file. Correspondingly, the search engine analyzes the JSON format file corresponding to the target website according to the SON format file analysis rule to obtain the structured data of the target website.
It should be understood that, for a target website with mass data, after the first structured data of the target website is uploaded, the target website may have a plurality of pieces of first structured data correspondingly. Therefore, as an exemplary embodiment, in parsing the corresponding file according to the corresponding parsing rule, in order to avoid data loss, after the first structured data in the corresponding file is parsed piece by piece, the parsed first structured data may be stored into the message queue piece by piece.
The message queue may be constructed based on message middleware developed by itself, or may be constructed based on message middleware such as kafka, rabbitmq, and rocktmq of open source, which is not limited in this embodiment.
And 102, writing the first structured data into a target file, storing the target file in a distributed storage system, and saving a first storage position of the first structured data.
The first storage location may include, but is not limited to, a file identifier of the target file, a starting location of the first structured data in the target file, a size of the first structured data, and other information, for example, the first storage location may further include location information of the target file in the distributed storage system (for example, the target file exists in storage device identification information in the distributed storage system).
It should be understood that, for a target website with mass data, after the first structured data of the target website is uploaded, the target website may have a plurality of pieces of first structured data correspondingly. Since each file is stored in a limited amount, as an exemplary embodiment, the plurality of pieces of first structured data of the destination web site may be stored through a plurality of subfiles.
Wherein each subfile stores therein partially structured data of the target web site.
Specifically, when it is detected that the quantity of the structured data of the message queue reaches a preset quantity threshold, the structured data buffered in the message queue may be written into the corresponding subfile.
For example, the target website is a music-like website based on a structured data framework, and assuming that the music-like website includes 300 pieces of structured data, it is assumed that each file in the distributed storage system can store 30 pieces of structured data. In this case, 10 files are required to store the structured data corresponding to the music website.
It should be understood that, in the distributed storage system, when the number of files is too large, the number of files in a directory is large, the directory hierarchy is also deepened, path lookup is affected, and multiple disk IO may be required for one path name lookup, so as to improve the efficiency of subsequently reading structured data from the distributed storage system and further improve the efficiency of a search engine to receive and record updated content of a website, as an exemplary implementation manner, after a plurality of pieces of first structured data of a target website are stored by a plurality of subfiles, a plurality of subfiles may be merged to obtain the target file. Correspondingly, the target file is stored in the distributed storage system, and the first structured first storage position is saved.
It should be understood that the first storage location may further include, but is not limited to, a file identifier of the target file, a file identifier of the subfile, and location information of the subfile in the target file.
And 103, when the first structured data is monitored to be stored completely, acquiring the first structured data from the distributed storage system according to the first storage position aiming at the target website.
And 104, acquiring second structured data corresponding to the target website in the webpage database, wherein the second structured data is the structured data submitted by the website leader last time.
And 105, comparing the first structured data with the second structured data, and updating the second structured data of the target website in the webpage database according to the comparison result.
Specifically, the first structured data and the second structured data are compared to obtain difference data between the first structured data and the second structured data, and then the first structured data of the target website in the webpage database is updated according to the difference.
Differential data may include, but is not limited to, newly added data, deleted data, and modified data, among others.
The newly added data may include at least one piece of newly added structured data, and may further include adding, for a piece of structured data, its corresponding entity name and entity attribute information.
The deleted data may include the deleted at least one piece of structured data, and may further include, for a piece of structured data, partial entity name and/or partial entity attribute information for deleting the piece of structured data.
The modified data refers to that for the structured data of the same entity name submitted last time and in the condition this time, the structured data of the entity name submitted this time is only slightly changed from the corresponding entity attribute information compared with the structured data of the name submitted last time.
For example, for a song a, it is assumed that the description information of the song a is text 1, text 2, … …, and text 10 at the time of last submission. At this submission, the description information of song a is word 1, word 2, … …, word 10, word 11, … …, and word 20. The song a is determined to be modified data through comparison, and after adding the text 10 to the description information of the song a in the web page database of the search engine, the text 10, the text 2, … … and the text 10 are added.
In an exemplary embodiment, for the first structured data and the second structured data, newly added data, deleted data and modified data in the newly structured data of the target website submitted this time, which are the same as the first structured data submitted last time, can be determined from the first structured data and the second structured data.
It should be understood that, for a target website, generally, the target website has a large amount of pieces of structured data, and for each piece of structured data, if the piece of structured data submitted this time and the piece of structured data submitted last time are completely the same, that is, the entity name and the corresponding entity attribute information of the piece of structured data submitted twice are the same, it is indicated that website content corresponding to the piece of structured data in the target website does not occur.
For each piece of structured data, if the entity names of the two submitted structured data are the same, but the entity attribute information of the two submitted versions is different, the modified entity attribute information of the structured data which is submitted last time is obtained.
For each piece of structured data, if the entity names of the two submitted pieces of structured data are inconsistent, determining the entity name which is newly added and/or deleted and the entity attribute information corresponding to the corresponding entity name relative to the structured data submitted last time.
Among other things, it is to be understood that for pieces of structured data submitted twice, newly added data, deleted data, and/or modified data are obtained relative to the last submission.
For example, assuming that the target website is a music website based on a structured data frame, assuming that one piece of structured data corresponds to one song, assuming that 300 pieces of structured data are submitted in the music website at the last submission and 301 pieces of structured data are submitted at the present submission, wherein the 301 pieces of structured data are compared with 300 pieces of structured data submitted at the last submission, the website has 1 ten thousand pieces of structured data deleted from the 300 pieces of structured data and 3 ten thousand pieces of structured data added thereto, and thus, by comparing the submitted structured data at the present time with the submitted structured data at the last submission, differential data between the submitted structured data at the present time and the submitted structured data at the last time is determined, and structured data in a web database in a search engine is updated according to the differential data.
To sum up, the method for updating website data in a search engine according to the embodiment of the present application receives the structured data of the target website submitted by the website owner this time, stores the structured data of the target website in the distributed storage system in a file form, supports the introduction of massive data of the website, compares the structured data of the target website submitted this time with the structured data submitted last time for the target website, and updates the structured data of the target website in the web database according to the comparison result. Therefore, the efficiency of introducing the structured data of the website into the search engine is improved, the efficiency of recording the updated content of the website in the search engine is improved, and the accuracy and the timeliness of the search result can be improved when the subsequent user searches through the search engine.
Based on the foregoing embodiment, a search engine generally provides services for a large number of websites, and when it is monitored that the first structured data is completely stored, in order to read the structured data of the data version corresponding to the target website from the distributed storage system storing different data versions of a plurality of websites, as an exemplary implementation, before acquiring, for the target website, the first structured data from the distributed storage system according to the first storage location, the method may further include: according to the data version information database, acquiring a website identification of a target website, a first data version of first structured data submitted this time and a second data version of second structured data submitted last time, then determining a first storage position of the first structured data according to the website identification and the first data version, and determining a second storage position corresponding to the target website in the webpage database according to the website identification and the second data version.
Wherein step 104 may include: and acquiring second structured data corresponding to the target website from the webpage database according to the second storage position.
Fig. 2 is a flowchart of a method for updating website data in a search engine according to another embodiment of the present application.
As shown in fig. 2, the method may include:
step 201, receiving the first structured data of the target website submitted by the website leader this time.
Step 202, a first data version of the first structured data submitted this time is obtained, and the corresponding relation between the first data version and the website identification of the target website is stored in a data version information database.
Step 203, writing the first structured data into the target file, storing the target file in the distributed storage system, and saving the first storage position of the first structured data.
Step 204, storing the corresponding relation among the first data version, the first storage position and the website identification into a preset relational database among the data version, the storage position and the website identification.
Step 205, when it is monitored that the storage of the first structured data is completed, acquiring the website identifier of the target website, the first data version of the first structured data submitted this time, and the second data version of the second structured data submitted last time according to the data version information database.
Step 206, determining a first storage location of the first structured data according to the website identifier, the first data version and the relational database.
And step 207, determining a second storage position corresponding to the target website in the webpage database according to the website identification and the second data version.
And 208, acquiring first structured data from a target file of the distributed storage system according to the first storage position aiming at the target website.
And 209, acquiring second structured data corresponding to the target website from the webpage database according to the second storage position, wherein the second structured data is the structured data submitted by the website leader last time.
Step 210, comparing the first structured data with the second structured data, and updating the second structured data of the target website in the web database according to the comparison result.
The method for updating the website data in the search engine in the embodiment of the application receives the structured data of the target website submitted by the website owner this time, stores the structured data of the target website in a distributed storage system in a file form, supports the introduction of massive data of the website, compares the structured data of the target website submitted this time with the structured data submitted last time aiming at the target website, and updates the second structured data of the target website in the webpage database according to the comparison result. Therefore, the efficiency of introducing the structured data of the website into the search engine is improved, the efficiency of recording the updated content of the website in the search engine is improved, and the accuracy and the timeliness of the search result can be improved when the subsequent user searches through the search engine.
In order to enable those skilled in the art to understand the method for updating website data in a search engine according to the embodiment of the present application, an exemplary description is given below, with reference to fig. 3, of the method for updating website data in a search engine according to the embodiment, where the website captain submits the structured data of the target website in a json file format.
The method for updating the website data in the search engine comprises the following specific processes:
1. data import and storage
Specifically, the website master submits the structured data of the current target website in a json file. Correspondingly, the search engine analyzes the json file and stores the data into the message queue one by one.
It should be noted that the message queue is used to buffer data and prevent data loss.
In the process of analyzing the json file, the data version of the structured data of the target website submitted this time can be obtained, and the data version of the target website is stored in the data version information database.
It should be noted that the data submitted by the station owner each time belongs to the same data version.
And the shuffle processing program of the data is responsible for subscribing the data from the message queue, accumulating the structured data to a certain number, recording the structured data into the distributed storage system in a file form, and storing the mapping relation between the structured data and the file in the meta database. At this point, the data stored in the distributed system is globally ordered.
It should be noted that, since the number of files is too large, the performance of data calculation is affected to some extent, and therefore, a periodic compact compression program is implemented to merge files.
An exemplary diagram of the process of data import and storage is shown in fig. 3.
2. Data set alignment judgment
When data is introduced, basic information of the data, including version numbers and data volumes, is stored in a database. And after the data catcher and the shuffle program receive the complete data and store the data in the distributed file system, a comparison command is issued to the data comparison module.
3. Comparison of data
First, basic information of two versions is obtained from a data version information database, and then stored file list information of the two versions is obtained from a website identifier (for example, a uniform resource locator URL can be used as the website identifier), a mapping relation database of structured data and files. Then, a certain amount of structured data submitted this time is read from the distributed storage system, and structured data submitted last time of a certain amount of target websites, for example, 5000 data at a time, is obtained from a webpage database of the search engine. The 5000 data are globally ordered, then the structured data corresponding to the two versions are compared, and the added, deleted and/or modified data are obtained according to the comparison result. Correspondingly, the structured data of the target website of the webpage database of the search engine is updated according to the added, deleted and/or modified data.
An exemplary diagram of a specific process of data comparison is shown in fig. 4.
The schematic of sequentially comparing the codes of the structured data of the target websites of the current submitted version and the last submitted version is as follows:
Figure BDA0001864745120000091
Figure BDA0001864745120000101
in order to implement the above embodiments, the present application further provides an apparatus for updating website data in a search engine.
Fig. 5 is a schematic structural diagram of an apparatus for updating website data in a search engine according to an embodiment of the present application.
It should be noted that the apparatus for updating website data in a search engine according to this embodiment is located in the search engine, where the search engine may include, but is not limited to, a distributed storage system and a web page database.
As shown in fig. 5, the apparatus for updating website data in a search engine may include a receiving module 110, a storage control module 120, a first obtaining module 130, a second obtaining module 140, and an updating module 150, wherein:
the receiving module 110 is configured to receive the first structured data of the target website submitted by the website leader this time.
The storage control module 120 is configured to write the first structured data into the target file, store the target file in the distributed storage system, and store a first storage location of the first structured data.
The first obtaining module 130 is configured to, when it is monitored that the first structured data is completely stored, obtain, for the target website, the first structured data from the target file of the distributed storage system according to the first storage location.
The second obtaining module 140 is configured to obtain second structured data corresponding to the target website in the web database, where the second structured data is structured data submitted by the website captain last time.
And the updating module 150 is configured to compare the first structured data with the second structured data, and update the second structured data of the target website in the web database according to the comparison result.
It should be noted that the explanation of the foregoing embodiment of the method for updating website data in a search engine is also applicable to the apparatus for updating website data in a search engine of this embodiment, and details are not repeated here.
The updating device for the website data in the search engine in the embodiment of the application receives the structured data of the target website submitted by the website owner this time, stores the structured data of the target website in a distributed storage system in a file form, supports the introduction of massive data of the website, compares the structured data of the target website submitted this time with the structured data submitted last time aiming at the target website, and updates the structured data of the target website in the webpage database according to the comparison result. Therefore, the efficiency of introducing the structured data of the website into the search engine is improved, the efficiency of recording the updated content of the website in the search engine is improved, and the accuracy and the timeliness of the search result can be improved when the subsequent user searches through the search engine.
In an embodiment of the present application, the update module 150 is specifically configured to: comparing the first structured data to the second structured data to determine differential data between the first structured data and the second structured data; and updating the first structured data of the target website in the webpage database according to the difference.
Differential data may include, but is not limited to, newly added data, deleted data, and modified data, among others.
The newly added data may include at least one piece of newly added structured data, and may further include adding, for a piece of structured data, its corresponding entity name and entity attribute information.
The deleted data may include the deleted at least one piece of structured data, and may further include, for a piece of structured data, partial entity name and/or partial entity attribute information for deleting the piece of structured data.
The modified data refers to that for the structured data of the same entity name submitted last time and in the condition this time, the structured data of the entity name submitted this time is only slightly changed from the corresponding entity attribute information compared with the structured data of the name submitted last time.
In an embodiment of the present application, the storage control module 120 is specifically configured to: saving a plurality of pieces of first structured data by a plurality of subfiles; and combining the plurality of sub-files to obtain the target file.
In an embodiment of the present application, on the basis of the embodiment of the apparatus shown in fig. 5, as shown in fig. 6, the apparatus may further include:
a third obtaining module 160, configured to obtain, according to the data version information database, a website identifier of the target website, a first data version of the first structured data submitted this time, and a second data version of the second structured data submitted last time;
the first determining module 170 determines a first storage location of the first structured data according to the website identifier and the first data version;
a second determining module 180, configured to determine, according to the website identifier and the second data version, a second storage location in the web database corresponding to the target website;
the second obtaining module 140 is specifically configured to: and acquiring second structured data corresponding to the target website from the webpage database according to the second storage position.
In an embodiment of the present application, on the basis of fig. 6, as shown in fig. 7, the apparatus may further include:
the fourth obtaining module 190 is configured to obtain a first data version of the first structured data submitted this time, and store a corresponding relationship between the first data version and the website identifier of the target website in the data version information database.
A storage module 200, configured to store the corresponding relationship between the first data version, the first storage location, and the website identifier in a preset database of relationships between the data version, the storage location, and the website identifier.
In order to implement the above embodiments, the present application also proposes a non-transitory computer-readable storage medium, which when executed by a processor, enables to perform the method for updating website data in a search engine shown in the above embodiments.
In order to implement the foregoing embodiments, the present application further provides a computer program product, which when executed by an instruction processor in the computer program product, executes the method for updating website data in a search engine shown in the foregoing embodiments.
Fig. 8 is a schematic hardware structure diagram of a search engine for executing a method for updating website data in the search engine according to an embodiment of the present application, and as shown in fig. 8, the search engine includes:
one or more processors 810 and a memory 820, with one processor 810 being an example in FIG. 8.
The electronic device may further include: an input device 830 and an output device 840.
The processor 810, the memory 820, the input device 830, and the output device 840 may be connected by a bus or other means, such as the bus connection in fig. 8.
The memory 820 is a non-transitory computer readable storage medium and can be used to store non-transitory software programs, non-transitory computer executable programs, and modules, such as program instructions/modules corresponding to the method for updating website data in a search engine in the embodiment of the present application (for example, the receiving module 110, the storage control module 120, the first obtaining module 130, the second obtaining module 140, and the updating module 150 shown in fig. 5). The processor 810 executes various functional applications of the server and data processing, i.e., a method for updating website data in a search engine in the above-described method embodiments, by executing the non-transitory software programs, instructions, and modules stored in the memory 820.
The memory 820 may include a program storage area and a data storage area, wherein the program storage area may store an operating system, an application program required for at least one function; the storage data area may store data created from use of an updating device of website data in a search engine, and the like. Further, the memory 820 may include high speed random access memory, and may also include non-transitory memory, such as at least one magnetic disk storage device, flash memory device, or other non-transitory solid state storage device. In some embodiments, memory 820 optionally includes memory located remotely from processor 810, which may be connected via a network to a means for updating website data in a search engine. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.
The input device 830 may receive input numeric or character information and generate key signal inputs related to user settings and function control of the updating device of website data in the search engine. The output device 840 may include a display device such as a display screen.
One or more modules are stored in memory 820 and when executed by the one or more processors 810 perform a method for updating website data in a search engine in any of the method embodiments described above.
The product can execute the method provided by the embodiment of the application, and has the corresponding functional modules and beneficial effects of the execution method. For technical details that are not described in detail in this embodiment, reference may be made to the methods provided in the embodiments of the present application.
In the description herein, reference to the description of the term "one embodiment," "some embodiments," "an example," "a specific example," or "some examples," etc., means that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the application. In this specification, the schematic representations of the terms used above are not necessarily intended to refer to the same embodiment or example. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples. Furthermore, various embodiments or examples and features of different embodiments or examples described in this specification can be combined and combined by one skilled in the art without contradiction.
Furthermore, the terms "first", "second" and "first" are used for descriptive purposes only and are not to be construed as indicating or implying relative importance or implicitly indicating the number of technical features indicated. Thus, a feature defined as "first" or "second" may explicitly or implicitly include at least one such feature. In the description of the present application, "plurality" means at least two, e.g., two, three, etc., unless specifically limited otherwise.
Any process or method descriptions in flow charts or otherwise described herein may be understood as representing modules, segments, or portions of code which include one or more executable instructions for implementing steps of a custom logic function or process, and alternate implementations are included within the scope of the preferred embodiment of the present application in which functions may be executed out of order from that shown or discussed, including substantially concurrently or in reverse order, depending on the functionality involved, as would be understood by those reasonably skilled in the art of the present application.
The logic and/or steps represented in the flowcharts or otherwise described herein, e.g., an ordered listing of executable instructions that can be considered to implement logical functions, can be embodied in any computer-readable medium for use by or in connection with an instruction execution system, apparatus, or device, such as a computer-based system, processor-containing system, or other system that can fetch the instructions from the instruction execution system, apparatus, or device and execute the instructions. For the purposes of this description, a "computer-readable medium" can be any means that can contain, store, communicate, propagate, or transport the program for use by or in connection with the instruction execution system, apparatus, or device. More specific examples (a non-exhaustive list) of the computer-readable medium would include the following: an electrical connection (electronic device) having one or more wires, a portable computer diskette (magnetic device), a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber device, and a portable compact disc read-only memory (CDROM). Additionally, the computer-readable medium could even be paper or another suitable medium upon which the program is printed, as the program can be electronically captured, via for instance optical scanning of the paper or other medium, then compiled, interpreted or otherwise processed in a suitable manner if necessary, and then stored in a computer memory.
It should be understood that portions of the present application may be implemented in hardware, software, firmware, or a combination thereof. In the above embodiments, the various steps or methods may be implemented in software or firmware stored in memory and executed by a suitable instruction execution system. If implemented in hardware, as in another embodiment, any one or combination of the following techniques, which are known in the art, may be used: a discrete logic circuit having a logic gate circuit for implementing a logic function on a data signal, an application specific integrated circuit having an appropriate combinational logic gate circuit, a Programmable Gate Array (PGA), a Field Programmable Gate Array (FPGA), or the like.
It will be understood by those skilled in the art that all or part of the steps carried by the method for implementing the above embodiments may be implemented by hardware that is related to instructions of a program, and the program may be stored in a computer-readable storage medium, and when executed, the program includes one or a combination of the steps of the method embodiments.
In addition, functional units in the embodiments of the present application may be integrated into one processing module, or each unit may exist alone physically, or two or more units are integrated into one module. The integrated module can be realized in a hardware mode, and can also be realized in a software functional module mode. The integrated module, if implemented in the form of a software functional module and sold or used as a separate product, may also be stored in a computer readable storage medium.
The storage medium mentioned above may be a read-only memory, a magnetic or optical disk, etc. Although embodiments of the present application have been shown and described above, it is understood that the above embodiments are exemplary and should not be construed as limiting the present application, and that variations, modifications, substitutions and alterations may be made to the above embodiments by those of ordinary skill in the art within the scope of the present application.

Claims (12)

1. A method for updating website data in a search engine, wherein the search engine comprises a distributed storage system and a webpage database, and the method comprises the following steps:
receiving first structured data of a target website submitted by a website leader this time;
writing the first structured data into a target file, storing the target file in the distributed storage system, and storing a first storage position of the first structured data;
when the first structured data are monitored to be stored completely, aiming at the target website, acquiring the first structured data from the target file of the distributed storage system according to the first storage position;
acquiring second structured data corresponding to the target website in the webpage database, wherein the second structured data is the structured data submitted by the website captain last time;
and comparing the first structured data with the second structured data, and updating the second structured data of the target website in the webpage database according to the comparison result.
2. The method of claim 1, wherein comparing the first structured data with the second structured data and updating the second structured data of the target website in the web page database according to the comparison result comprises:
comparing the first structured data to the second structured data to determine differential data between the first structured data and the second structured data;
and updating the first structured data of the target website in the webpage database according to the differences.
3. The method of claim 1, wherein the first structured data comprises a plurality of pieces, and wherein writing the first structured data to a destination file comprises:
saving a plurality of pieces of the first structured data by a plurality of subfiles;
and combining the plurality of subfiles to obtain the target file.
4. The method of claim 1, wherein prior to obtaining, for the target website, the first structured data from the target file of the distributed storage system according to the first storage location, further comprising:
acquiring a website identification of the target website, a first data version of the first structured data submitted this time and a second data version of the second structured data submitted last time according to a data version information database;
determining the first storage location of the first structured data according to the website identification and the first data version;
determining a second storage position corresponding to the target website in the webpage database according to the website identification and the second data version;
the acquiring of the second structured data corresponding to the target website in the web database includes:
and acquiring second structured data corresponding to the target website from the webpage database according to a second storage position.
5. The method of claim 4, further comprising:
acquiring the first data version of the first structured data submitted this time, and storing the corresponding relation between the first data version and the website identification of the target website in a data version information database;
and storing the corresponding relation among the first data version, the first storage position and the website identification into a preset relation database among the data version, the storage position and the website identification.
6. An apparatus for updating website data in a search engine, wherein the search engine comprises a distributed storage system and a webpage database, the apparatus comprising:
the receiving module is used for receiving first structured data of a target website submitted by a website leader this time;
the storage control module is used for writing the first structured data into a target file, storing the target file into the distributed storage system, and storing a first storage position of the first structured data;
a first obtaining module, configured to, when it is monitored that the storage of the first structured data is completed, obtain, for the target website, the first structured data from the target file of the distributed storage system according to the first storage location;
a second obtaining module, configured to obtain second structured data corresponding to the target website in the web database, where the second structured data is structured data submitted by the website owner last time;
and the updating module is used for comparing the first structured data with the second structured data and updating the second structured data of the target website in the webpage database according to a comparison result.
7. The apparatus of claim 6, wherein the update module is specifically configured to:
comparing the first structured data to the second structured data to determine differential data between the first structured data and the second structured data;
and updating the first structured data of the target website in the webpage database according to the differences.
8. The apparatus of claim 6, wherein the storage control module is specifically configured to:
saving a plurality of pieces of the first structured data by a plurality of subfiles;
and combining the plurality of subfiles to obtain the target file.
9. The apparatus of claim 6, further comprising:
a third obtaining module, configured to obtain, according to a data version information database, a website identifier of the target website, a first data version of the first structured data submitted this time, and a second data version of the second structured data submitted last time;
a first determining module, configured to determine the first storage location of the first structured data according to the website identifier and the first data version;
a second determining module, configured to determine, according to the website identifier and the second data version, a second storage location in the web database corresponding to the target website;
the second obtaining module is specifically configured to:
and acquiring second structured data corresponding to the target website from the webpage database according to a second storage position.
10. The apparatus of claim 9, further comprising:
a fourth obtaining module, configured to obtain the first data version of the first structured data submitted this time, and store a corresponding relationship between the first data version and the website identifier of the target website in a data version information database;
and the storage module is used for storing the corresponding relation among the first data version, the first storage position and the website identification into a preset relational database among the data version, the storage position and the website identification.
11. A search engine comprising a processor and a memory;
wherein the processor executes a program corresponding to the executable program code by reading the executable program code stored in the memory, for implementing the method for updating website data in a search engine according to any one of claims 1 to 5.
12. A non-transitory computer readable storage medium having a computer program stored thereon, wherein the computer program, when executed by a processor, implements a method for updating website data in a search engine according to any one of claims 1 to 5.
CN201811350507.4A 2018-11-14 2018-11-14 Method and device for updating website data in search engine and search engine Active CN109299352B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201811350507.4A CN109299352B (en) 2018-11-14 2018-11-14 Method and device for updating website data in search engine and search engine

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201811350507.4A CN109299352B (en) 2018-11-14 2018-11-14 Method and device for updating website data in search engine and search engine

Publications (2)

Publication Number Publication Date
CN109299352A CN109299352A (en) 2019-02-01
CN109299352B true CN109299352B (en) 2022-02-01

Family

ID=65146740

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201811350507.4A Active CN109299352B (en) 2018-11-14 2018-11-14 Method and device for updating website data in search engine and search engine

Country Status (1)

Country Link
CN (1) CN109299352B (en)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110502673A (en) * 2019-06-12 2019-11-26 广州虎牙科技有限公司 Data processing method, server and the device with store function
CN111367692B (en) * 2020-03-09 2023-08-22 政采云有限公司 Search engine data processing method and device, electronic equipment and medium
CN113326417B (en) * 2021-06-17 2023-08-01 北京百度网讯科技有限公司 Method and device for updating webpage library

Citations (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1997976A (en) * 2002-02-07 2007-07-11 Sap股份公司 User interface and dynamic grammar in multi-modal synchronization structure
CN101617336A (en) * 2007-02-13 2009-12-30 微软公司 The link of utilization structure data management webpage
CN102073726A (en) * 2011-01-11 2011-05-25 百度在线网络技术(北京)有限公司 Search engine system and structured data import method for search engine system
CN102571355A (en) * 2012-02-02 2012-07-11 飞天诚信科技股份有限公司 Method and device for importing secret key without landing
CN103365961A (en) * 2013-06-19 2013-10-23 北京时间中国网科技有限公司 Accurate search-oriented website structurization labeling method and system
CN103714078A (en) * 2012-09-29 2014-04-09 百度在线网络技术(北京)有限公司 Method, system and device for providing update contents of web pages
CN105630843A (en) * 2014-11-17 2016-06-01 广州市动景计算机科技有限公司 Webpage change monitoring method and device
CN105912609A (en) * 2016-04-06 2016-08-31 中国农业银行股份有限公司 Data file processing method and device
CN106469152A (en) * 2015-08-14 2017-03-01 阿里巴巴集团控股有限公司 A kind of document handling method based on ETL and system
CN106919405A (en) * 2015-12-24 2017-07-04 阿里巴巴集团控股有限公司 The initial method and device of a kind of client
CN106937275A (en) * 2017-02-13 2017-07-07 深圳盈达信息科技有限公司 A kind of equipment that system unique identifier and hardware ID are preserved under Android system

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20150026304A1 (en) * 2013-07-17 2015-01-22 Go Daddy Operating Company, LLC System for maintaining common data across multiple platforms
US10204136B2 (en) * 2015-10-19 2019-02-12 Ebay Inc. Comparison and visualization system

Patent Citations (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1997976A (en) * 2002-02-07 2007-07-11 Sap股份公司 User interface and dynamic grammar in multi-modal synchronization structure
CN101617336A (en) * 2007-02-13 2009-12-30 微软公司 The link of utilization structure data management webpage
CN102073726A (en) * 2011-01-11 2011-05-25 百度在线网络技术(北京)有限公司 Search engine system and structured data import method for search engine system
CN102571355A (en) * 2012-02-02 2012-07-11 飞天诚信科技股份有限公司 Method and device for importing secret key without landing
CN103714078A (en) * 2012-09-29 2014-04-09 百度在线网络技术(北京)有限公司 Method, system and device for providing update contents of web pages
CN103365961A (en) * 2013-06-19 2013-10-23 北京时间中国网科技有限公司 Accurate search-oriented website structurization labeling method and system
CN105630843A (en) * 2014-11-17 2016-06-01 广州市动景计算机科技有限公司 Webpage change monitoring method and device
CN106469152A (en) * 2015-08-14 2017-03-01 阿里巴巴集团控股有限公司 A kind of document handling method based on ETL and system
CN106919405A (en) * 2015-12-24 2017-07-04 阿里巴巴集团控股有限公司 The initial method and device of a kind of client
CN105912609A (en) * 2016-04-06 2016-08-31 中国农业银行股份有限公司 Data file processing method and device
CN106937275A (en) * 2017-02-13 2017-07-07 深圳盈达信息科技有限公司 A kind of equipment that system unique identifier and hardware ID are preserved under Android system

Also Published As

Publication number Publication date
CN109299352A (en) 2019-02-01

Similar Documents

Publication Publication Date Title
CN1799051B (en) Method for browsing contents using page storing file
CN109299352B (en) Method and device for updating website data in search engine and search engine
US20090019364A1 (en) Method and apparatus for generating electronic content guide
CN106874281B (en) Method and device for realizing database read-write separation
CN106970958B (en) A kind of inquiry of stream file and storage method and device
CN103608809A (en) Recommending data enrichments
US7720814B2 (en) Repopulating a database with document content
US7376650B1 (en) Method and system for redirecting a request using redirection patterns
US20210174004A1 (en) Methods and systems for dynamic customization of independent webpage section templates
CN113177168B (en) Positioning method based on Web element attribute characteristics
CN113010476B (en) Metadata searching method, device, equipment and computer readable storage medium
US20160188584A1 (en) System for tracking and displaying changes in a set of related electronic documents.
CN110968314B (en) Page generation method and device
CN110825600B (en) Page information processing method, server and page display device
KR101336846B1 (en) Contents Search Service Providing Method, Search Server and Search System Including that
CN112559913B (en) Data processing method, device, computing equipment and readable storage medium
CN108694172B (en) Information output method and device
US8341243B2 (en) Information processing apparatus, method and program
CN115705313A (en) Data processing method, device, equipment and computer readable storage medium
KR20090025607A (en) Method for updating a metadata of contents and apparatus therefor
US10824587B2 (en) Integrated universal file converter
CN113961181A (en) Code online editing method, device, client, server and storage medium
CN113297267A (en) Data caching and task processing method, device, equipment and storage medium
CN111641690A (en) Session message processing method and device and electronic equipment
US11693817B2 (en) Integrated universal file converter

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant