CN110110184B

CN110110184B - Information inquiry method, system, computer system and storage medium

Info

Publication number: CN110110184B
Application number: CN201711401432.3A
Authority: CN
Inventors: 胡雄伟
Original assignee: Beijing Jingdong Century Trading Co Ltd; Beijing Jingdong Shangke Information Technology Co Ltd
Current assignee: Beijing Jingdong Century Trading Co Ltd; Beijing Jingdong Shangke Information Technology Co Ltd
Priority date: 2017-12-21
Filing date: 2017-12-21
Publication date: 2021-07-06
Anticipated expiration: 2037-12-21
Also published as: CN110110184A

Abstract

The present disclosure provides an information query method, including: receiving an information query request, wherein the information query request is used for requesting to query relevant information on a static website; responding to the information query request, and querying an index library of the static website to find out serial number metadata which has an index relation with related information requested to be queried by the information query request and path metadata which has a corresponding relation with the serial number metadata; finding out an HTML file stored with related information based on the path described by the path metadata; and reading the relevant information from the HTML file. The present disclosure also provides an information query system, a computer system, and a computer-readable storage medium.

Description

Information inquiry method, system, computer system and storage medium

Technical Field

The present disclosure relates to the field of internet technologies, and in particular, to an information query method, system, computer system, and computer-readable storage medium.

Background

In the face of huge amount of internet information, how to find the needed information becomes more and more difficult, and searching is the most common information searching method. At present, the existing searching means can index the internet information conveniently.

However, in the course of implementing the disclosed concept, the inventors found that there are at least the following problems in the prior art: the existing search means needs more and more complex support conditions (software and hardware environments, such as database storage service), and the application scenarios are limited, such as only being applicable to dynamic websites, and some special application scenarios, such as static websites, are not applicable, so that information indexing cannot be performed on the static websites.

Disclosure of Invention

In view of the above, the present disclosure provides an information query method and system for implementing information indexing on a static website by creating an index library for the static website.

One aspect of the present disclosure provides an information query method, including: receiving an information query request, wherein the information query request is used for requesting to query relevant information on a static website; responding to the information query request, and querying an index library of the static website to find out serial number metadata having an index relationship with related information requested to be queried by the information query request and path metadata having a corresponding relationship with the serial number metadata; finding out an HTML file storing the related information based on the path described by the path metadata; and reading the related information from the HTML file.

According to an embodiment of the present disclosure, the information query method further includes generating the index library of the static website, and the operation includes: generating an abstract file of the static website, wherein the abstract file records a file title and a relative path of at least one HTML file contained in the static website; reading a corresponding relative path from the summary file for each file in the at least one HTML file; reading the corresponding HTML file based on the read relative path; dividing the metadata of the webpage content described by the read HTML file; and generating the index library of the static website based on the metadata division result.

According to an embodiment of the present disclosure, the dividing metadata of the web page content described in the read HTML file includes: matching the webpage content described by the read HTML file by using a regular expression to match basic data of each component in the webpage content; and adding sequence number metadata and path metadata to the basic data of each component part to realize metadata division of the webpage content described by the read HTML file.

According to the embodiment of the present disclosure, the information query method further includes: before adding the serial number metadata and the path metadata to the basic data of each component, performing label conversion processing on the basic data of at least one component in the matched basic data of each component in the webpage content; adding sequence number metadata and path metadata to relevant data of each component obtained after tag conversion aiming at basic data of each component which is subjected to tag conversion processing operation; and adding sequence number metadata and path metadata for the basic data of each component part aiming at the basic data of each component part which does not execute the label conversion processing operation.

According to an embodiment of the present disclosure, the generating the summary file of the static website includes: traversing all HTML files of the static website from the website root directory of the static website; extracting the file title and the relative path of each file in all the HTML files; and generating the abstract file of the static website to record file titles and relative paths of all HTML files contained in the static website.

Another aspect of the present disclosure provides an information query system, including: the receiving module is used for receiving an information query request, wherein the information query request is used for requesting to query the related information on the static website; a response module, configured to respond to the information query request and query an index library of the static website to find out sequence number metadata having an index relationship with related information requested to be queried by the information query request and path metadata having a corresponding relationship with the sequence number metadata; the determining module is used for finding out an HTML file stored with the related information based on the path described by the path metadata; and a reading module for reading the relevant information from the HTML file.

According to an embodiment of the present disclosure, the information query system further includes a generation module, configured to generate the index library of the static website, where the generation module is further configured to: generating an abstract file of the static website, wherein the abstract file records a file title and a relative path of at least one HTML file contained in the static website; reading a corresponding relative path from the summary file for each file in the at least one HTML file; reading the corresponding HTML file based on the read relative path; dividing the metadata of the webpage content described by the read HTML file; and generating the index library of the static website based on the metadata division result.

According to an embodiment of the present disclosure, the generating module is further configured to: matching the webpage content described by the read HTML file by using a regular expression to match basic data of each component in the webpage content; and adding sequence number metadata and path metadata to the basic data of each component part to realize metadata division of the webpage content described by the read HTML file.

According to an embodiment of the present disclosure, the information query system further includes: a processing module, configured to perform tag transformation processing on basic data of at least one component in the basic data of each component in the matched web content before adding serial number metadata and path metadata to the basic data of each component; the first adding module is used for adding serial number metadata and path metadata to the related data of each component obtained after the tag conversion aiming at the basic data of each component which has executed the tag conversion processing operation; and the second adding module is used for adding sequence number metadata and path metadata for the basic data of each component aiming at the basic data of each component which does not execute the label conversion processing operation.

According to an embodiment of the present disclosure, the generating module is further configured to: traversing all HTML files of the static website from the website root directory of the static website; extracting the file title and the relative path of each file in all the HTML files; and generating the abstract file of the static website to record file titles and relative paths of all HTML files contained in the static website.

Another aspect of the present disclosure provides a computer system comprising: one or more processors; a memory for storing one or more programs, wherein when the one or more programs are executed by the one or more processors, the one or more processors implement the information query method as described above.

Another aspect of the present disclosure provides a computer-readable storage medium having stored thereon executable instructions, which when executed by a processor, cause the processor to implement the information query method as described above.

According to the embodiment of the disclosure, because the technical means of creating the index library for the static web and then searching based on the index library is adopted, the technical problem that the information index cannot be performed on the static website in the related art can be at least partially solved, and therefore, the technical effect of performing the information index on the static website can be realized.

Drawings

The above and other objects, features and advantages of the present disclosure will become more apparent from the following description of embodiments of the present disclosure with reference to the accompanying drawings, in which:

FIG. 1 schematically illustrates a system architecture to which the information query method and system may be applied, according to an embodiment of the present disclosure;

FIG. 2A schematically illustrates a flow diagram of an information query method according to an embodiment of the disclosure;

FIG. 2B schematically shows a schematic diagram of an information query method according to an embodiment of the disclosure;

FIG. 3A schematically illustrates a flow diagram for generating an index repository of static web sites according to an embodiment of the present disclosure;

FIG. 3B schematically shows a flowchart for metadata partitioning of web page content described by a read HTML file, according to an embodiment of the present disclosure;

FIG. 3C is a schematic diagram that schematically illustrates metadata partitioning of web content described by a read HTML file, in accordance with an embodiment of the present disclosure;

FIG. 3D schematically illustrates a flow diagram of an information query method according to another embodiment of the disclosure;

FIG. 3E schematically shows a diagram of an information query method according to another embodiment of the present disclosure;

FIG. 3F schematically illustrates a flow diagram for generating a summary file for a static website according to an embodiment of the present disclosure;

FIG. 3G schematically illustrates a diagram of an information query method according to another embodiment of the disclosure;

FIG. 4 schematically illustrates a block diagram of an information query system according to an embodiment of the present disclosure;

FIG. 5A schematically illustrates a block diagram of an information query system according to another embodiment of the present disclosure;

FIG. 5B schematically shows a block diagram of an information query system according to another embodiment of the present disclosure; and

FIG. 6 schematically illustrates a block diagram of a computer system suitable for implementing the information query method according to an embodiment of the present disclosure.

Detailed Description

Hereinafter, embodiments of the present disclosure will be described with reference to the accompanying drawings. It should be understood that the description is illustrative only and is not intended to limit the scope of the present disclosure. Moreover, in the following description, descriptions of well-known structures and techniques are omitted so as to not unnecessarily obscure the concepts of the present disclosure.

The terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the disclosure. The terms "comprises," "comprising," and the like, as used herein, specify the presence of stated features, steps, operations, and/or components, but do not preclude the presence or addition of one or more other features, steps, operations, or components.

All terms (including technical and scientific terms) used herein have the same meaning as commonly understood by one of ordinary skill in the art unless otherwise defined. It is noted that the terms used herein should be interpreted as having a meaning that is consistent with the context of this specification and should not be interpreted in an idealized or overly formal sense.

Where a convention analogous to "at least one of A, B and C, etc." is used, in general such a construction is intended in the sense one having skill in the art would understand the convention (e.g., "a system having at least one of A, B and C" would include but not be limited to systems that have a alone, B alone, C alone, a and B together, a and C together, B and C together, and/or A, B, C together, etc.). Where a convention analogous to "A, B or at least one of C, etc." is used, in general such a construction is intended in the sense one having skill in the art would understand the convention (e.g., "a system having at least one of A, B or C" would include but not be limited to systems that have a alone, B alone, C alone, a and B together, a and C together, B and C together, and/or A, B, C together, etc.). It will be further understood by those within the art that virtually any disjunctive word and/or phrase presenting two or more alternative terms, whether in the description, claims, or drawings, should be understood to contemplate the possibilities of including one of the terms, either of the terms, or both terms. For example, the phrase "a or B" should be understood to include the possibility of "a" or "B", or "a and B".

The embodiment of the disclosure provides an information query method and system. The information query method comprises the steps of receiving an information query request, wherein the information query request is used for requesting to query relevant information on a static website; responding to the information query request, and querying an index library of the static website to find out serial number metadata which has an index relation with related information requested to be queried by the information query request and path metadata which has a corresponding relation with the serial number metadata; finding out an HTML file stored with related information based on the path described by the path metadata; and reading the relevant information from the HTML file.

Fig. 1 schematically shows a system architecture to which the information query method and system may be applied according to an embodiment of the present disclosure. It should be noted that fig. 1 is only an example of a system architecture to which the embodiments of the present disclosure may be applied to help those skilled in the art understand the technical content of the present disclosure, and does not mean that the embodiments of the present disclosure may not be applied to other devices, systems, environments or scenarios.

As shown in fig. 1, the system architecture 100 according to this embodiment may include

terminal devices

101, 102, 103, a network 104 and a server 105. The network 104 serves as a medium for providing communication links between the

terminal devices

101, 102, 103 and the server 105. Network 104 may include various connection types, such as wired, wireless communication links, or fiber optic cables, to name a few.

The user may use the

terminal devices

101, 102, 103 to interact with the server 105 via the network 104 to receive or send messages or the like. The

terminal devices

101, 102, 103 may have installed thereon various communication client applications, such as shopping-like applications, web browser applications, search-like applications, instant messaging tools, mailbox clients, social platform software, etc. (by way of example only).

The

terminal devices

101, 102, 103 may be various electronic devices having a display screen and supporting web browsing, including but not limited to smart phones, tablet computers, laptop portable computers, desktop computers, and the like.

The server 105 may be a server providing various services, such as a background management server (for example only) providing support for websites browsed by users using the

terminal devices

101, 102, 103. The background management server may analyze and perform other processing on the received data such as the user request, and feed back a processing result (e.g., a webpage, information, or data obtained or generated according to the user request) to the terminal device.

It should be noted that the information query method provided by the embodiment of the present disclosure may be generally executed by the server 105. Accordingly, the information query system provided by the embodiments of the present disclosure may be generally disposed in the server 105. The information query method provided by the embodiment of the present disclosure may also be executed by a server or a server cluster different from the server 105 and capable of communicating with the

terminal devices

101, 102, 103 and/or the server 105. Accordingly, the information query system provided by the embodiment of the present disclosure may also be disposed in a server or a server cluster different from the server 105 and capable of communicating with the

terminal devices

101, 102, 103 and/or the server 105.

It should be understood that the number of terminal devices, networks, and servers in fig. 1 is merely illustrative. There may be any number of terminal devices, networks, and servers, as desired for implementation.

Fig. 2A schematically shows a flow chart of an information query method according to an embodiment of the present disclosure.

As shown in fig. 2A, the information query method may include operations S201 to S204, in which:

in operation S201, an information query request is received, where the information query request is used to request to query relevant information on a static website.

In an embodiment of the present disclosure, each static website may include one or more hypertext markup Language (HTML) files, and the web page content displayed by the static website may be contained in the one or more HTML files, each of the one or more HTML files may include one or more HTML commands, wherein the one or more HTML commands may be used to describe Text, graphics, animation, sound, tables, links, and the like, which are not limited herein.

According to the embodiment of the disclosure, in the case that a user wants to query related information on a static website, an information query request can be submitted to a server through a client, and the server can decide whether to receive the information query request according to the current situation. For example, in the case that the server determines that the information query request can be currently answered, the information query request may be received; in the case that the current access amount of the server is too large, the server may refuse to receive the information query request.

In operation S202, in response to the information query request, the index library of the static website is queried to find sequence number metadata having an index relationship with related information requested to be queried by the information query request and path metadata having a corresponding relationship with the sequence number metadata.

In an embodiment of the present disclosure, the index library may store sequence number metadata having an index relationship with the related information, and path metadata having a corresponding relationship with the sequence number metadata, where the sequence number metadata may be used to describe a location of the related information in the index library, and the path metadata may be used to describe a storage path of an HTML file storing the related information in a static website.

According to the embodiment of the disclosure, in the case that the server receives the information query request, the server may further respond to the information query request and search the sequence number metadata and the path metadata from the index database.

For example, in an index library of a static website, 3 sequence number metadata and 3 path metadata having a correspondence relationship with the 3 sequence number metadata are stored, wherein the 3 sequence number metadata are stored in the index library in an increasing order. For example, the 3 sequence number metadata are "1", "2" and "3", respectively, where sequence number metadata "1" has an index relationship with "related information 1", sequence number metadata "2" has an index relationship with "related information 2", and sequence number metadata "3" has an index relationship with "related information 3"; the 3 path metadata are "a", "B", and "C", respectively, where path metadata "a" corresponds to sequence number metadata "1", path metadata "B" corresponds to sequence number metadata "2", and path metadata "C" corresponds to sequence number metadata "3". When the user requests to query the "related information 1", the server may find the sequence metadata "1" from the index repository according to the index relationship, and determine the path metadata "a" according to the corresponding relationship.

In operation S203, based on the path described by the path metadata, an HTML file storing related information is found.

In the embodiment of the present disclosure, in the case that the above path metadata is found, an HTML file storing this related information may be further found from the static website based on the path described by the path metadata.

For example, in connection with the above example, if the path described by the path metadata "A" is "d: \ feig \ hgg", then the HTML file storing "related information 1" can be found according to "d: \ feig \ hgg".

In operation S204, relevant information is read from the HTML file.

In an embodiment of the present disclosure, in a case where an HTML file storing such related information is found, the related information may be read from the HTML file and displayed in a page of the static website.

Fig. 2B schematically shows a schematic diagram of an information query method according to an embodiment of the present disclosure.

As shown in fig. 2B, in an embodiment of the present disclosure, data in a physical address may be used to represent path metadata (note that logically adjacent data is not necessarily physically adjacent), data in column 1 may be used to represent sequence number metadata, and data in column 2 may be used to represent related information, where the sequence number metadata in column 1 may be arranged in an increasing order.

According to an embodiment of the present disclosure, in order to speed up the search for the related information in column 2, a binary search tree (which may be used to represent a data sorting algorithm in a computer algorithm) may be established, where each node of the binary search tree may contain an index key value, which may include sequence number metadata in column 1, and a pointer for pointing to path metadata in a physical address corresponding to the index key value.

In the embodiment of the present disclosure, when the user searches for the related information through the binary search tree, sequence number metadata having an index relationship with the related information may be determined from the column 1 according to the index relationship, and path metadata corresponding to the sequence number metadata may be found based on the nodes of the binary search tree, and further, an HTML file storing the related information may be found based on the path metadata.

For example, in a case where a user requests to query the related information "34" in column 2, since the sequence number metadata having an index relationship with the related information "34" is "1", and the node containing the sequence number metadata "1" may further contain a pointer to the path metadata "0 x 07" in the physical address, the HTML file storing the related information "34" may be found from the static website based on the path metadata "0 x 07".

Based on binary check through the disclosed embodimentWhen finding the tree and searching the related information, the operation complexity satisfies 0 (log)₂n) and thus the complexity of finding information can be reduced.

Through the embodiment of the disclosure, the purpose of performing information indexing on the static website based on the index library is realized by creating the index library for the static website.

The method shown in fig. 2A-2B is further described with reference to fig. 3A-3G in conjunction with specific embodiments.

FIG. 3A schematically illustrates a flow diagram for generating an index repository of static websites according to an embodiment of the present disclosure.

In this embodiment, the information query method described with reference to fig. 2 may further include generating an index library of static websites, and the operations may include operations S301 to S305. As shown in fig. 3A, wherein:

in operation S301, a summary file of a static website is generated, wherein a file header of at least one HTML file included in the static website and a relative path thereof are recorded in the summary file.

In operation S302, for each of the at least one HTML file, a corresponding relative path is read from the digest file.

In operation S303, based on the read relative path, the corresponding HTML file is read.

In operation S304, metadata division is performed on the web page content described by the read HTML file.

In operation S305, an index library of static websites is generated based on the metadata partition result.

In an embodiment of the disclosure, for any one HTML file in the at least one HTML file, the summary file may include a file header of the HTML file and a relative path for storing the HTML file, where the relative path may be used to indicate a storage location of the HTML file in a static website, and the file header and the relative path have a corresponding relationship.

According to the embodiment of the disclosure, in the case of generating the summary file of the static website, the relative path corresponding to each HTML file is read from the summary file, and based on the relative path, the corresponding HTML file is read from the static website.

It should be understood that, since the above-mentioned file header and the above-mentioned relative path have a correspondence relationship, reading the corresponding HTML file based on the read relative path may be reading the corresponding HTML file based on the read relative path and the correspondence relationship of the relative path and the file header. Specifically, if two or more HTML files are included in the same relative path, after the storage location of the corresponding HTML file is found based on the relative path, the corresponding HTML file needs to be determined from the two or more HTML files based on the file header corresponding to the relative path.

For example, "file title 1" is a file title of "HTML file 1," file title 2 "is a file title of" HTML file 2, "and" file title 1 "corresponds to" relative path 1, "and" file title 2 "corresponds to" relative path 2, "are stored in summary file summary. In practice, the corresponding "relative path 1" may be read from the above summary.md, and the corresponding "HTML file 1" may be read from the static website based on the "relative path 1" and the "file header 1".

It should be noted that reading the corresponding HTML file based on the relative path may include multiple ways, for example, reading may be performed according to the sequence of recording the relative path in the summary file, or randomly selecting the relative path recorded in the summary file to perform reading, or a combination of the two ways, which is not limited herein.

In the embodiment of the disclosure, since the HTML file is used to describe the web page content of the static website, in the case of reading the corresponding HTML file, the metadata of the web page content described by the HTML file may be further divided, and according to the division result, the index library of the static website may be generated.

The index library may be stored in an index json file, or the index json file may be stored in a root directory of a static website, which is not limited herein. Json may be used to represent a lightweight data exchange format, among others.

According to the embodiment of the disclosure, in the case of generating the index library of the static website, the service may receive and respond to the information query request, and further query the index library of the static website, so as to achieve the purpose of reading the relevant information from the HTML file.

Through the embodiment of the disclosure, the purpose of indexing information of the static website can be realized by establishing the index library of the static website.

Fig. 3B schematically shows a flowchart of metadata partitioning of web page content described by a read HTML file according to an embodiment of the present disclosure.

In this embodiment, operation S304 described with reference to fig. 2 and 3A (i.e., metadata partitioning of web page content described by the read HTML file) may include operations S401 to S402. As shown in fig. 3B, wherein:

in operation S401, the web content described in the read HTML file is matched by using a regular expression to match basic data of each component in the web content.

In operation S402, sequence number metadata and path metadata are added to the basic data of each component part to implement metadata partitioning of web page content described by the read HTML file.

In the embodiment of the present disclosure, the regular expression may be represented as a logical formula operating on a character string, for example, specific characters are predefined, and a "regular character string" is formed according to the specific characters and a combination of the specific characters, and the "regular character string" may be used to represent a filtering logic on the character string.

According to an embodiment of the disclosure, as shown in fig. 3C, the components in the web page content may include, but are not limited to, an article ID, an article title, an article description, an article keyword, article content, and the like, wherein the article description may be a general content of the entire article.

In the embodiment of the present disclosure, the basic data of each component described above may be data represented by HTML, for example, the basic data (which may be referred to as HTML tag) of each component described above may be represented as "< title > </title >" (title), "< keywords > </keywords >" (keyword), "< description > </description >" (description), "< body > </body >" (content), or the like.

According to the embodiment of the disclosure, sequence number metadata and path metadata can be added to the basic data of each component to achieve the purpose of metadata division of the webpage content described by the read HTML file, wherein for each component from the same HTML file, the same sequence number metadata and path metadata can be added to the basic data of the component(s); for each component from different HTML files, different sequence number metadata and path metadata may be added to the underlying data for the component(s).

For example, "title 1" and "keyword 1" are components of "HTML file 1", and "title 2" and "keyword 2" are components of "HTML file 2", the same "sequence number metadata 1" and "path metadata 1" may be added to "title 1" and "keyword 1"; the same "sequence number metadata 2" and "path metadata 2" are added for "title 2" and "keyword 2", where "sequence number metadata 1" is different from "sequence number metadata 2" and "path metadata 1" is different from "path metadata 2".

Through the embodiment of the disclosure, the purpose of dividing the metadata of the webpage content described by the read HTML file is realized by adding the sequence number metadata and the path metadata to the basic data of each component.

Fig. 3D schematically shows a flow chart of an information query method according to another embodiment of the present disclosure.

In this embodiment, the information query method may include operations S501 to S503, in addition to the corresponding operations described above with reference to fig. 2A and 3B. For the sake of simplicity of description, descriptions of corresponding operations in fig. 2A and 3B are omitted here.

As shown in fig. 3D, the information query method may further include operations S501 to S503. Wherein:

in operation S501, before adding the sequence number metadata and the path metadata to the basic data of each component, the basic data of at least one component in the basic data of each component in the matched web content is subjected to tag transformation.

In operation S502, for the basic data of each component part on which the tag transformation processing operation has been performed, sequence number metadata and path metadata are added to the related data of each component part obtained after the tag transformation.

In operation S503, for the basic data of each component part for which the tag conversion processing operation is not performed, sequence number metadata and path metadata are added to the basic data itself of each component part.

In the embodiment of the present disclosure, the tag transformation processing may include, but is not limited to, performing tag transformation processing on the basic data of at least one component in the basic data of each component by using a tag transformation technology of "HTML transformation MD". For example, "< body > </body >" (content) may be subjected to the tag conversion process using the tag conversion technique of "HTML to MD".

According to the embodiment of the disclosure, as shown in fig. 3E, after performing regular matching (the regular matching may be a matching rule for testing a regular expression) on the web page content described by the read HTML, the basic data of each component in the web page content may be matched, wherein the basic data of each component can be expressed as "< h1> </h1 >", "< p > </p >", "< br >", "Strong.b", and the like, the "< h1> </h1 >" can then be converted into "#", "< p > </p >" can be converted into "\\ n \ n", "< br >" can be converted into "\\ n", and "strong.b" can be converted into "@", by using the tag conversion technology of "HTML conversion compact markup language (MD)" wherein the related data of each component part obtained after conversion can be referred to as MD tags.

In the embodiment of the present disclosure, for the basic data of each component part on which the tag conversion processing operation has been performed, such as "< h1> </h1 >", "< p > </p >", "< br >", "strong.b", sequence number metadata and path metadata are added to the related data of each component part obtained after conversion, such as "#", "\\ n \ n", "\ n", and "-" respectively. For the related data corresponding to the components from the same HTML file, the same sequence number metadata and path metadata may be added to the related data (or the related data); for the related data corresponding to the components from different HTML files, different sequence number metadata and path metadata may be added to the related data (or the related data).

According to the embodiment of the present disclosure, for the basic data of each component that does not execute the tag transformation processing operation, the operation method of adding the sequence number metadata and the path metadata to the basic data of each component is similar to that described above, and is not described herein again.

Through the embodiment of the disclosure, since the complicated HTML tag can be converted into the simple MD tag, a large amount of redundant data can be removed, effective information can be reserved, meanwhile, the storage space can be reduced, and the utilization efficiency of resources can be improved.

FIG. 3F schematically illustrates a flow diagram for generating a summary file for a static website according to an embodiment of the disclosure.

In this embodiment, operation S301 (i.e., generating a summary file of a static website) described with reference to fig. 2A and 3A may include operations S601 to S603. As shown in fig. 3F, wherein:

in operation S601, all HTML files of the static website are traversed from the site root directory of the static website.

In operation S602, a file header of each of all HTML files and its relative path are extracted.

In operation S603, a summary file of the static website is generated to record file headers and relative paths thereof of all HTML files contained in the static website.

In the embodiment of the disclosure, in the case that there is no summary file summary.md on the static website, the summary file summary.md may be automatically generated, so as to achieve the purpose of recording file titles and relative paths of all HTML files included in the static website.

According to embodiments of the present disclosure, all HTML files may be traversed from the site root directory of a static website, which may include its subdirectories. Then, according to all found HTML files, a file header of each HTML file and a relative path of the HTML file are obtained, wherein, as described above, the file header and the relative path have a corresponding relationship. Further, generating an abstract file of the static website according to the extracted file title and the relative path.

After the summary file is generated, the summary file may be stored in a root directory of the static website, which is not limited herein.

According to the embodiment of the disclosure, the abstract file is generated, and the file title and the relative path of the HTML file are stored in the abstract file, so that the management intensity is improved, and the convenience for reading the HTML file is improved.

Fig. 3G schematically shows a schematic diagram of an information query method according to another embodiment of the present disclosure.

As shown in fig. 3G, wherein:

in operation S701, a static website is traversed.

In operation S702, sum.md is generated.

In operation S703, a static web page is read.

In operation S704, titles, descriptions, keywords, contents are extracted.

In operation S705, the replacement HTML tag becomes a simple tag.

In operation S706, path metadata is added.

In operation S707, sequence number metadata is added.

In operation S708, the metadata is structured.

In operation S709, the local index repository is stored.

In the embodiment of the present disclosure, the establishment of the index library of the static website may be mainly divided into four major steps, which are: generating abstract files, metadata partition, HTML label conversion and generating an index library of the static website.

According to the embodiment of the disclosure, the summary file of the static website may be generated by traversing all HTML files from a site root directory of the static website, then extracting file titles and relative paths of all the HTML files, and sequentially writing the file titles and the relative paths into a summary file summary.

In the embodiment of the present disclosure, the metadata division may be that the content of the summary.md file is read, the corresponding HTML file is sequentially read according to the relative path recorded in the summary.md file, and then components, such as a title, a description, a keyword, content, and the like, are extracted from the web page content described by each HTML file, where the basic data of each component may be represented by a regular expression. Further, sequence number metadata and path metadata are added to the basic data of each component.

According to the embodiment of the present disclosure, the HTML tag conversion may be to perform tag conversion processing on the basic data of each component to achieve the purpose of representing each component by a simple tag.

It should be noted that after the HTML tag conversion is performed, sequence number metadata and path metadata may also be added to the related data of each converted component.

In the embodiment of the present disclosure, the local index library may be stored by respectively building index libraries for titles, keywords, and descriptions according to specific requirements, or building index libraries (which may be referred to as full-text indexes) for the basic data of all the components and/or the related data of all the components, which is not limited herein.

According to the embodiment of the disclosure, aiming at the static website without index library support, the traditional information index technology cannot provide a corresponding solution, and the index library can be quickly established for the webpage content of the static website without investing complex support conditions, so that data source support is provided for a search mechanism.

FIG. 4 schematically shows a block diagram of an information query system according to an embodiment of the disclosure.

As shown in fig. 4, the information query system 400 may include a receiving module 410, a response module 420, a determining module 430, and a first reading module 440. Wherein:

the receiving module 410 is configured to receive an information query request, where the information query request is used to request to query relevant information on a static website.

The response module 420 is configured to respond to the information query request and query the index library of the static website to find out the sequence number metadata having an index relationship with the related information requested to be queried by the information query request and the path metadata having a corresponding relationship with the sequence number metadata.

The determining module 430 is configured to find out an HTML file storing related information based on the path described by the path metadata.

The reading module 440 is used for reading the relevant information from the HTML file.

FIG. 5A schematically illustrates a block diagram of an information query system according to another embodiment of the disclosure.

In this embodiment, the information query system 400 may include a generation module 510 in addition to the respective modules described above with reference to fig. 4. For the sake of brevity of description, descriptions with reference to corresponding blocks in fig. 4 are omitted herein.

As shown in fig. 5A, the information query system 400 may further include a generation module 510 for generating an index library of static websites. Wherein the generating module 510 is further configured to: and generating a summary file of the static website, wherein the summary file records a file title and a relative path of at least one HTML file contained in the static website. For each of the at least one HTML file, a corresponding relative path is read from the summary file. And reading the corresponding HTML file based on the read relative path. And performing metadata division on the webpage content described by the read HTML file. And generating an index library of the static website based on the metadata division result.

As an alternative embodiment, the generating module is further configured to: matching the webpage content described by the read HTML file by using a regular expression to match basic data of each component in the webpage content; and adding sequence number metadata and path metadata to the basic data of each component part to realize metadata division of the webpage content described by the read HTML file.

FIG. 5B schematically shows a block diagram of an information query system according to another embodiment of the disclosure.

In this embodiment, the information query system 400 may include a processing module 610, a first adding module 620, and a second adding module 630, in addition to the respective modules described above with reference to fig. 4. For the sake of brevity of description, descriptions with reference to corresponding blocks in fig. 4 are omitted herein.

As shown in fig. 5B, the information query system 400 may further include a processing module 610, a first adding module 620, and a second adding module 630. Wherein:

the processing module 610 is configured to perform label conversion processing on the basic data of at least one component in the basic data of each component in the matched web content before adding the sequence number metadata and the path metadata to the basic data of each component.

The first adding module 620 is configured to add, to the basic data of each component that has performed the tag transformation processing operation, sequence number metadata and path metadata for the related data of each component obtained after the tag transformation.

The second adding module 630 is configured to add, to the basic data of each component that does not perform the tag translation processing operation, sequence number metadata and path metadata to the basic data of each component.

As an alternative embodiment, the generating module is further configured to: traversing all HTML files of the static website from a website root directory of the static website; extracting the file title and the relative path of each file in all HTML files; and generating a summary file of the static website to record file headers and relative paths of all HTML files contained in the static website.

It is understood that the receiving module 410, the responding module 420, the determining module 430, and the reading module 440, the generating module 510, the processing module 610, the first adding module 620, and the second adding module 630 may be combined in one module to be implemented, or any one of them may be split into a plurality of modules. Alternatively, at least part of the functionality of one or more of these modules may be combined with at least part of the functionality of the other modules and implemented in one module. According to an embodiment of the present disclosure, at least one of the receiving module 410, the responding module 420, the determining module 430, and the reading module 440, the generating module 510, the processing module 610, the first adding module 620, and the second adding module 630 may be implemented at least in part as a hardware circuit, such as a Field Programmable Gate Array (FPGA), a Programmable Logic Array (PLA), a system on a chip, a system on a substrate, a system on a package, an Application Specific Integrated Circuit (ASIC), or may be implemented in hardware or firmware in any other reasonable manner of integrating or packaging a circuit, or in a suitable combination of three implementations of software, hardware, and firmware. Alternatively, at least one of the receiving module 410, the responding module 420, the determining module 430 and the reading module 440, the generating module 510, the processing module 610, the first adding module 620 and the second adding module 630 may be at least partially implemented as a computer program module, which when executed by a computer, may perform the functions of the respective modules.

As another aspect, the present disclosure also provides a computer system including: one or more processors; a storage device for storing one or more programs, wherein the one or more programs, when executed by the one or more processors, cause the one or more processors to implement the information query method as described above.

FIG. 6 schematically illustrates a block diagram of a computer system suitable for implementing the information query method according to an embodiment of the present disclosure. The computer system illustrated in FIG. 6 is only one example and should not impose any limitations on the scope of use or functionality of embodiments of the disclosure.

As shown in fig. 6, a computer system 800 according to an embodiment of the present disclosure includes a processor 801 that can perform various appropriate actions and processes according to a program stored in a Read Only Memory (ROM)802 or a program loaded from a storage section 808 into a Random Access Memory (RAM) 803. The processor 801 may include, for example, a general purpose microprocessor (e.g., a CPU), an instruction set processor and/or associated chipset, and/or a special purpose microprocessor (e.g., an Application Specific Integrated Circuit (ASIC)), among others. The processor 801 may also include onboard memory for caching purposes. The processor 801 may include a single processing unit or multiple processing units for performing the different actions of the method flows described with reference to fig. 2A-2B, 3A-3G in accordance with embodiments of the present disclosure.

In the RAM 803, various programs and data necessary for the operation of the computer system 800 are stored. The processor 801, the ROM 802, and the RAM 803 are connected to each other by a bus 804. The processor 801 performs various operations described above with reference to fig. 2A to 2B, 3A to 3G by executing programs in the ROM 802 and/or the RAM 803. Note that the programs may also be stored in one or more memories other than the ROM 802 and RAM 803. The processor 801 may also perform the various operations described above with reference to fig. 2A-2B, 3A-3G by executing programs stored in the one or more memories.

The computer system 800 may also include an input/output (I/O) interface 805, the input/output (I/O) interface 805 also being connected to the bus 804, according to an embodiment of the present disclosure. Computer system 800 may also include one or more of the following components connected to I/O interface 805: an input portion 806 including a keyboard, a mouse, and the like; an output section 807 including a signal such as a Cathode Ray Tube (CRT), a Liquid Crystal Display (LCD), and the like, and a speaker; a storage portion 808 including a hard disk and the like; and a communication section 809 including a network interface card such as a LAN card, a modem, or the like. The communication section 809 performs communication processing via a network such as the internet. A drive 810 is also connected to the I/O interface 805 as necessary. A removable medium 811 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, or the like is mounted on the drive 810 as necessary, so that a computer program read out therefrom is mounted on the storage section 808 as necessary.

According to an embodiment of the present disclosure, the method described above with reference to the flow chart may be implemented as a computer software program. For example, embodiments of the present disclosure include a computer program product comprising a computer program embodied on a computer readable medium, the computer program comprising program code for performing the method illustrated in the flow chart. In such an embodiment, the computer program can be downloaded and installed from a network through the communication section 809 and/or installed from the removable medium 811. The computer program, when executed by the processor 801, performs the above-described functions defined in the system of the embodiments of the present disclosure. The systems, devices, apparatuses, modules, units, etc. described above may be implemented by computer program modules according to embodiments of the present disclosure.

It should be noted that the computer readable media shown in the present disclosure may be computer readable signal media or computer readable storage media or any combination of the two. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples of the computer readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the present disclosure, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In contrast, in the present disclosure, a computer-readable signal medium may include a propagated data signal with computer-readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: wireless, wire, fiber optic cable, RF, etc., or any suitable combination of the foregoing. According to embodiments of the present disclosure, a computer-readable medium may include one or more memories other than the ROM 802 and/or the RAM 803 and/or the ROM 802 and the RAM 803 described above.

The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams or flowchart illustration, and combinations of blocks in the block diagrams or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

As another aspect, the present disclosure also provides a computer-readable medium having stored thereon executable instructions, which when executed by a processor, cause the processor to implement the above-mentioned information query method. The computer readable medium may be contained in the apparatus described in the above embodiments; or may be separate and not incorporated into the device. The computer readable medium carries one or more programs which, when executed by a device, cause the device to perform: receiving an information query request, wherein the information query request is used for requesting to query relevant information on a static website; responding to the information query request, and querying an index library of the static website to find out serial number metadata which has an index relation with related information requested to be queried by the information query request and path metadata which has a corresponding relation with the serial number metadata; finding out an HTML file stored with related information based on the path described by the path metadata; and reading the relevant information from the HTML file.

The embodiments of the present disclosure have been described above. However, these examples are for illustrative purposes only and are not intended to limit the scope of the present disclosure. Although the embodiments are described separately above, this does not mean that the measures in the embodiments cannot be used in advantageous combination. The scope of the disclosure is defined by the appended claims and equivalents thereof. Various alternatives and modifications can be devised by those skilled in the art without departing from the scope of the present disclosure, and such alternatives and modifications are intended to be within the scope of the present disclosure.

Claims

1. An information query method, comprising:

receiving an information query request, wherein the information query request is used for requesting to query relevant information on a static website;

responding to the information query request, and querying an index library of the static website to find out sequence number metadata having an index relationship with related information requested to be queried by the information query request and path metadata having a corresponding relationship with the sequence number metadata;

finding out an HTML file storing the related information based on the path described by the path metadata; and

reading the relevant information from the HTML file;

the method further includes generating the index repository for the static web sites, including:

and generating a summary file of the static website, wherein the summary file records a file title and a relative path of at least one HTML file contained in the static website.

2. The method of claim 1, wherein the generating the index repository of static websites further comprises: for each file in the at least one HTML file, reading a corresponding relative path from the summary file;

reading the corresponding HTML file based on the read relative path;

dividing the metadata of the webpage content described by the read HTML file; and

and generating the index library of the static website based on the metadata division result.

3. The method of claim 2, wherein the metadata partitioning of the web page content described by the read HTML file comprises:

matching the webpage content described by the read HTML file by using a regular expression to match basic data of each component in the webpage content; and

and adding sequence number metadata and path metadata to the basic data of each component to realize metadata division of the webpage content described by the read HTML file.

4. The method of claim 3, wherein the method further comprises:

before adding the serial number metadata and the path metadata to the basic data of each component, performing label conversion processing on the basic data of at least one component in the matched basic data of each component in the webpage content;

adding sequence number metadata and path metadata to relevant data of each component obtained after tag conversion aiming at basic data of each component which is subjected to tag conversion processing operation; and

for the basic data of each component part which does not execute the label conversion processing operation, adding sequence number metadata and path metadata for the basic data of each component part.

5. The method of claim 2, wherein the generating the summary file of the static website comprises:

traversing all HTML files of the static website from a website root directory of the static website;

extracting the file title and the relative path of each file in all the HTML files; and

and generating the abstract file of the static website to record file titles and relative paths of all HTML files contained in the static website.

6. An information query system, comprising:

the receiving module is used for receiving an information query request, wherein the information query request is used for requesting to query the related information on the static website;

the response module is used for responding to the information query request and querying the index library of the static website so as to find out serial number metadata which has an index relation with the related information requested to be queried by the information query request and path metadata which has a corresponding relation with the serial number metadata;

the determining module is used for finding out an HTML file storing the related information based on the path described by the path metadata; and

the reading module is used for reading the related information from the HTML file;

the system further comprises a generation module for generating the index repository of the static website, the generation module further for:

7. The system of claim 6, wherein the generation module is further to:

for each file in the at least one HTML file, reading a corresponding relative path from the summary file;

reading the corresponding HTML file based on the read relative path;

8. The system of claim 7, wherein the generation module is further to:

9. The system of claim 8, wherein the system further comprises:

the processing module is used for performing label conversion processing on the basic data of at least one component in the matched basic data of each component in the webpage content before adding the serial number metadata and the path metadata to the basic data of each component;

the first adding module is used for adding serial number metadata and path metadata to the related data of each component obtained after the tag conversion aiming at the basic data of each component which has executed the tag conversion processing operation; and

and the second adding module is used for adding sequence number metadata and path metadata for the basic data of each component aiming at the basic data of each component which does not execute the label conversion processing operation.

10. The system of claim 7, wherein the generation module is further to:

11. A computer system, comprising:

one or more processors;

a memory for storing one or more programs,

wherein the one or more programs, when executed by the one or more processors, cause the one or more processors to implement the information query method of any one of claims 1 to 5.

12. A computer-readable storage medium having stored thereon executable instructions that, when executed by a processor, cause the processor to implement the information query method of any one of claims 1 to 5.