CN110020272B - Caching method and device and computer storage medium - Google Patents

Caching method and device and computer storage medium Download PDF

Info

Publication number
CN110020272B
CN110020272B CN201710689714.1A CN201710689714A CN110020272B CN 110020272 B CN110020272 B CN 110020272B CN 201710689714 A CN201710689714 A CN 201710689714A CN 110020272 B CN110020272 B CN 110020272B
Authority
CN
China
Prior art keywords
node
character
url
current
refreshed
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201710689714.1A
Other languages
Chinese (zh)
Other versions
CN110020272A (en
Inventor
侯光华
冀晖
张平
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tianyi Cloud Technology Co Ltd
Original Assignee
China Telecom Corp Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by China Telecom Corp Ltd filed Critical China Telecom Corp Ltd
Priority to CN201710689714.1A priority Critical patent/CN110020272B/en
Publication of CN110020272A publication Critical patent/CN110020272A/en
Application granted granted Critical
Publication of CN110020272B publication Critical patent/CN110020272B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/955Retrieval from the web using information identifiers, e.g. uniform resource locators [URL]
    • G06F16/9566URL specific, e.g. using aliases, detecting broken or misspelled links
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/957Browsing optimisation, e.g. caching or content distillation
    • G06F16/9574Browsing optimisation, e.g. caching or content distillation of access to content, e.g. by caching

Abstract

The invention discloses a caching method, a caching device and a computer storage medium, and relates to the technical field of computers. In the invention, the URL of each cache file is stored in a URL tree, each node in the URL tree stores characters, the number of the characters is variable, the first character of each node in the same layer is different, the characters from a root node to each leaf node or middle node can form a complete URL or domain name, and the cache information of the cache file is stored in the node corresponding to the last character of the URL. The storage structure of the URL tree is convenient to search and manage, and when a file refreshing request exists, corresponding cache information can be simply and quickly searched according to a corresponding URL or a corresponding domain name, so that the file is refreshed. In addition, the storage structure of the URL tree can save storage space.

Description

Caching method and device and computer storage medium
Technical Field
The present invention relates to the field of computer technologies, and in particular, to a caching method and apparatus, and a computer storage medium.
Background
The cache server is an edge server of a CDN (Content Delivery Network). The cache content is stored in a local hard disk, and in order to improve the concurrent access speed of the mass cache content, distributed storage of a plurality of hard disks is generally provided. The content of a page corresponding to a URL (Uniform Resource Locator) is cached as a file, and the cached file storage directory and the file name are encrypted. The cache server distributes the cache files to different hard disks according to a hash algorithm calculated by the URL, and even the URL under the same domain name directory is not necessarily stored under one hard disk.
For the refreshing of a single URL, after the cache server receives a refreshing request, the hard disk and the path position stored by the URL and the encrypted file name are calculated according to the same algorithm during caching, and then the file is deleted. But for the whole refreshing of a directory, the processing effect of the cache server is very complex, and the file name is encrypted, so that the file name cannot be found out in a traversing hard disk mode; since the URL storage positions under the same directory are distributed, all URL storage positions under the directory cannot be found according to the directory name; it is also unknown which URLs are stored locally under a directory, so it is not feasible to calculate the location of the cached files and then delete them from the URL by one. At present, the following methods are used for refreshing URLs in the same directory:
the method comprises the following steps: when receiving a directory refreshing request, the cache server does not delete all the cache files of the URLs under the directory, but records the refreshing requirement of the directory and the refreshing task time, when an internet user requests, the cache server inquires whether the requested URL is under the directory of the refreshing request, and if the requested URL is under the directory of the refreshing request and the current cache file updating time is before the refreshing task time, the source station captures the latest content and updates the cache.
The second method comprises the following steps: when the cache server receives the directory refreshing request, all the current cache file names are imported into the database for sequencing, all the files under the directory to be refreshed are found, and then the files are deleted.
Disclosure of Invention
The inventor finds that in the method, the cache information of the file is not well managed after the file is cached, and the corresponding file cannot be quickly found when a refreshing task exists, so that the method for refreshing the file corresponding to the URL or the directory is complex and low in efficiency.
The invention aims to solve the technical problems that: the file cache information management method is capable of rapidly finding the file cache information and facilitating refreshing of cache files simply and efficiently.
According to an embodiment of the present invention, a caching method is provided, including: carrying out reverse order arrangement on all fields of a domain name part of a Uniform Resource Locator (URL) corresponding to the cache file, wherein the sequence of the fields except the domain name part and all characters in all the fields is unchanged, and obtaining the URL to be inserted; matching is carried out from the first character of the URL to be inserted and the root node of the URL tree: if the current character is the same as the first character in the unmatched character corresponding to the current node, updating the next character to be the current character, and if the characters corresponding to the current node are completely matched, selecting the child node of the current node to be updated to be the current node; if the current character is different from the first character in the unmatched character corresponding to the current node, and the character which is different from the current character in a matching way is the first character corresponding to the current node, inserting the current character and all characters after the current byte as child nodes of a parent node of the current node; if the first character in the unmatched characters corresponding to the current node is different from the first character in the unmatched characters corresponding to the current node, and the character matched with the current character and different from the current character is not the first character corresponding to the current node, extracting a character before the character matched with the current character in the current node as a new node to replace the current node, extracting a character matched with the current character and a character after the character matched with the current character as a first child node of the new node to insert, inserting the current character and all characters after the current character as second child nodes of the new node, and inserting paths from each child node of the current node to each leaf node after the first child node; and storing the cache information of the cache file corresponding to the URL to be inserted into the node corresponding to the last character of the URL to be inserted.
In one embodiment, the method further comprises: carrying out reverse order arrangement on all fields of a domain name part of a URL (Uniform resource locator) corresponding to a file to be refreshed, wherein the fields except the domain name part and the sequence of all characters in all the fields are unchanged, and obtaining the URL to be refreshed; matching the characters of the URL to be refreshed with the characters corresponding to each node in the URL tree to obtain the node corresponding to the last character of the URL to be refreshed; and searching the corresponding file according to the cache information of the file stored in the node corresponding to the last character of the URL to be refreshed, and refreshing the file.
In one embodiment, the method further comprises: carrying out reverse order arrangement on all fields of the domain name corresponding to the file to be refreshed, wherein the order of all characters in all the fields is unchanged, and obtaining the domain name to be refreshed; matching the characters of the domain name to be refreshed with the characters corresponding to each node in the URL tree to obtain the node corresponding to the last character of the domain name to be refreshed; and searching each corresponding file according to the node corresponding to the last character of the domain name to be refreshed and the cache information of the file stored in each child node of the node, and refreshing each file.
In one embodiment, matching the characters of the URL to be refreshed with the characters corresponding to the nodes in the URL tree includes: matching is performed starting from the first character of the URL to be refreshed and the root node of the URL tree: if the current character of the URL to be refreshed is the same as the first character in the unmatched character corresponding to the current node, the next character is updated to be the current character, if the characters corresponding to the current node are completely matched, the child node of the current node is selected to be updated to be the current node, and until all the characters of the URL to be refreshed are completely matched.
In one embodiment, matching the characters of the domain name to be refreshed with the characters corresponding to the nodes in the URL tree includes: matching is carried out from the first character of the domain name to be refreshed and the root node of the URL tree: if the current character of the domain name to be refreshed is the same as the first character in the unmatched character corresponding to the current node, the next character is updated to be the current character, and if the characters corresponding to the current node are completely matched, the child node of the current node is selected to be updated to be the current node until all the characters of the domain name to be refreshed are completely matched.
In one embodiment, each node in the URL tree stores an initial value of a child node of the node and a child node index array of the node, the initial value of the child node is a minimum value of a corresponding first character of each child node, and the child node index array stores addresses of each child node in the memory respectively; selecting the child node of the current node to update as the current node comprises: subtracting the initial value of the child node corresponding to the current node by using the current character to obtain a child node index array value of the current node; searching a corresponding child node index array according to the child node index array value; and finding the corresponding child node according to the memory address in the child node index array, and updating the child node to be the current node.
In one embodiment, the characters except for the separators are represented by ASCII (American Standard code for information interchange) codes, and the midpoints and oblique lines of the separators are represented by different preset values.
According to another embodiment of the present invention, there is provided a cache apparatus including: the URL conversion module is used for carrying out reverse order arrangement on all fields of a domain name part of a uniform resource locator URL corresponding to the cache file, and obtaining a URL to be inserted, wherein the fields except the domain name part and the sequence of all characters in all the fields are unchanged;
the URL insertion module is used for matching the first character of the URL to be inserted with the root node of the URL tree: if the current character is the same as the first character in the unmatched character corresponding to the current node, updating the next character to be the current character, and if the characters corresponding to the current node are completely matched, selecting the child node of the current node to be updated to be the current node; if the current character is different from the first character in the unmatched character corresponding to the current node, and the character which is different from the current character in a matching way is the first character corresponding to the current node, inserting the current character and all characters after the current byte as child nodes of a parent node of the current node; if the first character in the unmatched characters corresponding to the current node is different from the first character in the unmatched characters corresponding to the current node, and the character matched with the current character and different from the current character is not the first character corresponding to the current node, extracting a character before the character matched with the current character in the current node as a new node to replace the current node, extracting a character matched with the current character and a character after the character matched with the current character as a first child node of the new node to insert, inserting the current character and all characters after the current character as second child nodes of the new node, and inserting paths from each child node of the current node to each leaf node after the first child node; and the cache information storage module is used for storing the cache information of the cache file corresponding to the URL to be inserted into the node corresponding to the last character of the URL to be inserted.
In one embodiment, the apparatus further comprises: and the URL refreshing module is used for carrying out reverse sequence arrangement on all fields of the domain name part of the URL corresponding to the file to be refreshed, obtaining the URL to be refreshed, matching the characters of the URL to be refreshed with the characters corresponding to all nodes in the URL tree to obtain the node corresponding to the last character of the URL to be refreshed, searching the corresponding file according to the cache information of the file stored in the node corresponding to the last character of the URL to be refreshed, and refreshing the file.
In one embodiment, the apparatus further comprises: the directory refreshing module is used for carrying out reverse order arrangement on all fields of the domain name corresponding to the file to be refreshed, obtaining the domain name to be refreshed, matching the characters of the domain name to be refreshed with the characters corresponding to all nodes in the URL tree, obtaining the node corresponding to the last character of the domain name to be refreshed, searching the corresponding files according to the node corresponding to the last character of the domain name to be refreshed and the cache information of the files stored in all sub-nodes of the node, and refreshing the files.
In one embodiment, the URL refresh module is configured to match the first character of the URL to be refreshed with the root node of the URL tree: if the current character of the URL to be refreshed is the same as the first character in the unmatched character corresponding to the current node, the next character is updated to be the current character, if the characters corresponding to the current node are completely matched, the child node of the current node is selected to be updated to be the current node, and until all the characters of the URL to be refreshed are completely matched.
In one embodiment, the directory refresh module is configured to match the first character of the domain name to be refreshed with the root node of the URL tree: if the current character of the domain name to be refreshed is the same as the first character in the unmatched character corresponding to the current node, the next character is updated to be the current character, and if the characters corresponding to the current node are completely matched, the child node of the current node is selected to be updated to be the current node until all the characters of the domain name to be refreshed are completely matched.
In one embodiment, each node in the URL tree stores an initial value of a child node of the node and a child node index array of the node, the initial value of the child node is a minimum value of a corresponding first character of each child node, and the child node index array stores addresses of each child node in the memory respectively; the URL inserting module is used for subtracting the initial value of the child node corresponding to the current node from the current character to obtain the child node index array value of the current node, finding the corresponding child node index array according to the child node index array value, finding the corresponding child node according to the memory address in the child node index array, and updating the child node to be the current node.
In one embodiment, the characters except for the separators are represented by ASCII (American Standard code for information interchange) codes, and the midpoints and oblique lines of the separators are represented by different preset values.
According to another embodiment of the present invention, there is provided a cache apparatus including: a memory; and a processor coupled to the memory, the processor configured to perform the caching method of any of the preceding embodiments based on instructions stored in the memory device.
According to yet another embodiment of the present invention, there is provided a computer-readable storage medium having stored thereon a computer program which, when executed by a processor, implements the caching method of any one of the preceding embodiments.
In the invention, the URL of each cache file is stored in a URL tree, each node in the URL tree stores characters, the number of the characters is variable, the first character of each node in the same layer is different, the characters from a root node to each leaf node or middle node can form a complete URL or domain name, and the cache information of the cache file is stored in the node corresponding to the last character of the URL. The storage structure of the URL tree is convenient to search and manage, and when a file refreshing request exists, corresponding cache information can be simply and quickly searched according to a corresponding URL or a corresponding domain name, so that the file is refreshed. In addition, the storage structure of the URL tree can save storage space.
Other features of the present invention and advantages thereof will become apparent from the following detailed description of exemplary embodiments thereof, which proceeds with reference to the accompanying drawings.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.
Fig. 1 is a schematic structural diagram of a cache apparatus according to an embodiment of the present invention.
Fig. 2 is a schematic structural diagram of a cache apparatus according to another embodiment of the present invention.
Fig. 3 is a flowchart illustrating a caching method according to an embodiment of the present invention.
FIG. 4 is a diagram illustrating a URL tree structure according to an embodiment of the present invention.
Fig. 5 is a flowchart illustrating a caching method according to another embodiment of the present invention.
Fig. 6 is a schematic structural diagram of a cache apparatus according to an embodiment of the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. The following description of at least one exemplary embodiment is merely illustrative in nature and is in no way intended to limit the invention, its application, or uses. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
The scheme is provided for solving the problems that a file refreshing method is complex and low in efficiency due to a file caching mode in the prior art.
The caching apparatus in the embodiments of the present invention may each be implemented by various computing devices or computer systems, which are described below in conjunction with fig. 1 and 2.
Fig. 1 is a block diagram of a cache apparatus according to an embodiment of the present invention. As shown in fig. 1, the apparatus 10 of this embodiment includes: a memory 110 and a processor 120 coupled to the memory 110, the processor 120 being configured to perform the caching method in any one of the embodiments of the invention based on instructions stored in the memory 110.
Memory 110 may include, for example, system memory, fixed non-volatile storage media, and the like. The system memory stores, for example, an operating system, an application program, a Boot Loader (Boot Loader), a database, and other programs.
Fig. 2 is a structural diagram of another embodiment of the cache apparatus of the present invention. As shown in fig. 2, the apparatus 20 of this embodiment includes: memory 210 and processor 220, memory 210 and processor 220 are similar to memory 110 and processor 120, respectively, of fig. 1. An input output interface 230, a network interface 240, a storage interface 250, and the like may also be included. These interfaces 230, 240, 250 and the connection between the memory 210 and the processor 220 may be, for example, via a bus 260. The input/output interface 230 provides a connection interface for input/output devices such as a display, a mouse, a keyboard, and a touch screen. The network interface 240 provides a connection interface for various networking devices, such as a database server or a cloud storage server. The storage interface 250 provides a connection interface for external storage devices such as an SD card and a usb disk.
The present invention proposes a caching method, which is described below with reference to fig. 3 and 4.
FIG. 3 is a flowchart illustrating a caching method according to an embodiment of the present invention. As shown in fig. 3, the method of this embodiment includes:
step S302, all fields of the domain name part of the URL corresponding to the cache file are arranged in a reverse order, the sequence of all the fields except the domain name part and all the characters in all the fields is unchanged, and the URL to be inserted is obtained.
For example, www.test.cn/2014/07/12, the domain name part is www.test.cn, the fields of the domain name are arranged in reverse order, and are converted into cn.test.www/2014/07/12 if the sequence of the fields except the domain name part and the sequence of the characters in the fields are not changed.
To facilitate storage and matching, the URL may be converted to a long integer. Specifically, the characters other than the separators are expressed by ASCII codes (american standard code for information interchange), and the midpoints and oblique lines of the separators are expressed by different preset values, respectively. For example, a point (.) is represented by 0 and a slash (/) is represented by 1. The absence of ASCII code representation of the delimiter facilitates distinguishing between points or diagonal lines in the URL that are not used to represent the delimiter. And when the characters are matched actually, matching the numerical values corresponding to the characters.
For example, cn. test. www/2014/07/12 is converted to a long integer (16 th) of 636 e 07465737407777771323031341303713132. 63 is the ASCII code for c, and so on.
Step S304, selecting the first character of the URL to be inserted as the current character, and selecting the root node of the URL tree as the current node.
Usually, the root node in the URL tree is an entry, and stores not a specific character but the memory location of a child node.
Step S306, determining whether all the characters corresponding to the current node are matched, if so, executing step S308, otherwise, executing step S310.
Step S308, selecting the child node of the current node and updating the child node as the current node. The execution is resumed back to step S306.
As shown in fig. 4, is a URL tree constructed using URLs www.test.cn, www.test.cn/news, www.test.cn/2016/08, video. uv. cn/study. mp4, www.test.com, mp3.test.com/10.mp3, mp3.test.com/13.mp 3.
Let the URL to be inserted be cn.test.www/2014/07, first match the first character c of the URL, enter from the root node entry, and if there is no character in the root node, update the child node of the root node to the current node.
The child nodes of the current node may be selected in a preset order, for example, from left to right, and updated as the current node.
Step S310, determining whether the current character is the same as the first character in the unmatched character corresponding to the current node, if so, executing step S312, otherwise, executing step S314.
In step S312, the next character of the URL to be inserted is updated to be the current character. The execution is resumed back to step S306.
As shown in fig. 4, the first character c of the URL is matched with the child node of the root node, and if the matching is the same, n is updated to be the current character.
In step S314, it is determined whether the character that is different from the current character is the first character corresponding to the current node. If so, step S316 is performed, otherwise, step S318 is performed.
Step S316, insert the current character of the URL to be inserted and all characters after the current byte as child nodes of the parent node of the current node.
As shown in fig. 4, assuming that www.test.cn/2016/08 and www.test.cn/news are not inserted into the URL tree either, and the URL to be inserted is cn. And when the current character is t, matching with the node corresponding to uv, if the current character is t and the current character is different from uv, and if t is different from the initial character v, inserting all characters after t and t as child nodes of n. To this end, a URL to be inserted is inserted into the URL tree.
Step S318, extracting a character before the character matching with the current character in the current node as a new node to replace the current node, extracting a character matching with the current character and after the character as a first child node of the new node to insert, inserting the current character in the URL to be inserted and all characters after the current character as a second child node of the new node, and inserting each path from each child node of the current node to each leaf node after the first child node.
In the present invention, "first" and "second" in the first child node and the second child node are only used for distinguishing different child nodes, and do not play any other limiting role.
As shown in fig. 4, assuming that the URL to be inserted is cn.test.www/2014/07, the current character is 4, the current node is a node corresponding to 2016, 4 and 6 are matched, the two are different, and 6 is not the first character of 2016, then the character 201 before 6 is extracted to construct a new node, the characters 6 and after are inserted as the first child node of 201, the characters 4 and after/07 are inserted as the second child node of 201, and the paths from the child nodes to the leaf nodes of the current node 2016 are inserted after the first child node 6.
It should be noted that, if the current character and the characters after the current character are too long, the current character and the characters after the current character may be split according to a separator as a plurality of nodes, for example, when cn.
Step S320, storing the cache information of the cache file corresponding to the URL to be inserted into the node corresponding to the last character of the URL to be inserted.
If the URL to be inserted finds that the same URL exists in the URL tree through the matching process, the cache information of the cache file is only needed to be stored into the node corresponding to the last character of the URL to be inserted.
A node of the URL tree corresponds to a segment of memory address, and stores cache information of characters and cache files, where the cache information includes: a cache address in the disk and an encrypted file name. The file name is encrypted, so that the safety factor is high, and the file name is not easy to tamper.
In the method of the above embodiment, the URLs of the cache files are stored in a URL tree, each node in the URL tree stores characters, the number of the characters is not fixed, the first characters of the nodes in the same layer are different, the characters from the root node to each leaf node or intermediate node can form a complete URL or domain name, and the cache information of the cache files is stored in the node corresponding to the last character of the URL. The storage structure of the URL tree is convenient to search and manage, and when a file refreshing request exists, corresponding cache information can be simply and quickly searched according to a corresponding URL or a corresponding domain name, so that the file is refreshed. In addition, the storage structure of the URL tree can save storage space.
The invention also provides a method for refreshing the file based on the URL, which is described in conjunction with FIG. 5.
FIG. 5 is a flowchart illustrating a caching method according to another embodiment of the present invention. As shown in fig. 5, after step S320, the method may further include:
step S502, all fields of the domain name part of the URL corresponding to the file to be refreshed are arranged in a reverse order, the sequence of all the fields except the domain name part and all the characters in all the fields is unchanged, and the URL to be refreshed is obtained.
This step may refer to the method in step S302.
Step S504, matching the character of the URL to be refreshed with the character corresponding to each node in the URL tree to obtain the node corresponding to the last character of the URL to be refreshed.
The specific matching process is, for example:
(1) and taking the first character of the URL to be refreshed as the current character, and selecting the root node of the URL tree as the current node.
(2) And (4) judging that all the characters corresponding to the current node are matched, if so, executing the step (3), and otherwise, executing the step (4).
(3) And (4) selecting the child node of the current node to update the current node, and returning to the step (2) to restart the execution.
(4) And (5) judging whether the first character in the unmatched characters corresponding to the current character and the current node is the same, and if so, executing the step (5).
(5) And (3) judging whether the URL to be refreshed has unmatched characters, if so, returning to the step (2) to restart the execution, and otherwise, finishing the matching.
Step S506, according to the cache information of the file stored in the node corresponding to the last character of the URL to be refreshed, the corresponding file is searched, and the file is refreshed.
The refresh operation may be to delete the corresponding file. And searching the corresponding file according to the cache address in the disk in the cache information.
After step S320, the method may further include:
step S508, performing reverse order arrangement on each field of the domain name corresponding to the file to be refreshed, and obtaining the domain name to be refreshed without changing the order of each character in each field.
And refreshing the files under the directory, namely refreshing the files corresponding to all URLs under the same domain name. This step may refer to the method in step S302.
Step S510, matching the characters of the domain name to be refreshed with the characters corresponding to the nodes in the URL tree, to obtain the node corresponding to the last character of the domain name to be refreshed.
The specific matching process is, for example:
(1) and taking the first character of the domain name to be refreshed as a current character, and selecting a root node of the URL tree as a current node.
(2) And (4) judging that all the characters corresponding to the current node are matched, if so, executing the step (3), and otherwise, executing the step (4).
(3) And (4) selecting the child node of the current node to update the current node, and returning to the step (2) to restart the execution.
(4) And (5) judging whether the first character in the unmatched characters corresponding to the current character and the current node is the same, and if so, executing the step (5).
(5) And (3) judging whether the domain name to be refreshed has unmatched characters, if so, returning to the step (2) to restart the execution, and otherwise, finishing the matching.
Step S512, each corresponding file is searched according to the node corresponding to the last character of the domain name to be refreshed and the cache information of the file stored in each child node of the node, and each file is refreshed.
As shown in fig. 4, if the domain name to be refreshed is cn.test.www, and the node corresponding to the last character of the domain name is found to be test.www/, then according to test.www/and the cache information of the files stored in the child nodes, i.e., news, 201, 4/07, 6/08, the corresponding files are found, and the files are refreshed.
According to the method of the embodiment, the cache information of the file to be refreshed can be quickly found by matching the URL and the domain name corresponding to the file to be refreshed with the nodes of the URL tree, and the file is quickly refreshed.
In the above embodiments, after the URL or domain name to be matched is matched with a node in a layer of the URL tree, matching is continued to the child nodes of the node in the layer, and the child nodes may be sequentially matched according to a certain sequence, for example, from left to right, but the speed may be slower. The present invention further provides a node storage method based on the URL in the above embodiment, which can increase the matching speed, and is described below with reference to fig. 4.
A node in the URL tree corresponds to a segment of memory address where the attribute of the node is stored. The node attributes may include: the node key, that is, the character corresponding to the node is the numerical value converted by the conversion method in step S302; the initial value of each child node is the minimum value of the corresponding first character of each child node; the child node index array is used for respectively storing the addresses of all child nodes in the memory; and caching information corresponding to the node. Leaf nodes must have corresponding cache information, and non-leaf nodes also have corresponding cache information. The cache information comprises the name of the cache file and the position of the cache file in the disk.
The first node of the child node key in the child node index array corresponding to the child node is the child node initial value. Since the key first bytes of the respective child nodes are not necessarily consecutive, there is an index value in the child node index array that is empty.
For example, in fig. 4, the child nodes of the node 201 are 4 and 6, and the node attributes of 201 include: the key of the node is ASCII code of 201, the initial value of the child node is ASCII code of 4, the index array of the child node is storage position of index array [0]:4/, the index array [1]: null, the index array [2]: 6/. The index array in which child node 6/is located is 6-4-2.
Further, in the foregoing embodiment, the updating the child node of the current node selected in step S308 to the current node includes: subtracting the initial value of the child node corresponding to the current node by using the current character to obtain a child node index array value of the current node; searching a corresponding child node index array according to the child node index array value; and finding the corresponding child node according to the memory address in the child node index array, and updating the child node to be the current node.
Similarly, the step (3) in the step S504 and the step (3) in the step S510 may also update the child node as the current node in the same manner.
As shown in fig. 4, assuming that cn.test.www/2015/08 is a node to be inserted, the current character is 5, the current node 201 has been matched, and the current node needs to be updated, 5 is subtracted from the initial value 4 of the child node to obtain 1, which is an index array value, and if the index array 1 is found to be stored as empty, the matching is unsuccessful, and the character which is not matched with 5 is the first character of the node, 5 is inserted as the child node of 201, and the memory address of 5 is stored in the index array 1.
Further, in the embodiment shown in fig. 3, when inserting a node that is not originally present in the URL tree, the memory address of the node may be stored in the corresponding index array, that is, the initial character of the inserted node is subtracted by the initial value of the child node of the parent node of the inserted node to obtain the corresponding index array value. If the initial character of the inserted node is smaller than the initial value of the child node, an index array needs to be newly created, and the initial value of the child node is updated to be the value of the initial character of the inserted node.
The method of the embodiment can accelerate the matching speed of the URL or the domain name in the URL tree and further improve the efficiency, thereby realizing the refreshing of the file simply, conveniently and efficiently.
The present invention also provides a caching apparatus, which is described below with reference to fig. 6.
FIG. 6 is a block diagram of an embodiment of a cache apparatus of the present invention. As shown in fig. 6, the apparatus 60 includes:
the URL conversion module 602 is configured to perform reverse order arrangement on each field of the domain name portion of the uniform resource locator URL corresponding to the cache file, where the sequence of the fields other than the domain name portion and each character in each field is not changed, so as to obtain the URL to be inserted.
A URL insertion module 604, configured to perform matching starting from a first character of a URL to be inserted and a root node of a URL tree:
if the current character is the same as the first character in the unmatched character corresponding to the current node, updating the next character to be the current character, and if the characters corresponding to the current node are completely matched, selecting the child node of the current node to be updated to be the current node;
if the current character is different from the first character in the unmatched character corresponding to the current node, and the character which is different from the current character in a matching way is the first character corresponding to the current node, inserting the current character and all characters after the current byte as child nodes of a parent node of the current node;
if the first character in the unmatched characters corresponding to the current node is different from the first character in the unmatched characters corresponding to the current node, and the character matched with the current character and different from the first character is not the first character corresponding to the current node, extracting the character before the character matched with the current character in the current node as a new node to replace the current node, extracting the character matched with the current character and the character after the character as a first child node of the new node to insert, inserting the current character and all characters after the current character as second child nodes of the new node, and inserting paths from each child node of the current node to each leaf node after the first child node.
In one embodiment, each node in the URL tree stores an initial value of a child node of the node and a child node index array of the node, the initial value of the child node is a minimum value of corresponding first characters of each child node, and the child node index array stores addresses of each child node in the memory.
The URL inserting module 604 is configured to subtract the initial value of the child node corresponding to the current node from the current character to obtain a child node index array value of the current node, find the corresponding child node index array according to the child node index array value, find the corresponding child node according to the memory address in the child node index array, and update the child node as the current node.
A cache information storage module 606, configured to store the cache information of the cache file corresponding to the URL to be inserted into a node corresponding to the last character of the URL to be inserted.
In one embodiment, the caching apparatus 60 may further include:
the URL refreshing module 608 is configured to perform reverse order arrangement on each field of the domain name portion of the URL corresponding to the file to be refreshed, obtain the URL to be refreshed if the sequence of the fields other than the domain name portion and each character in each field is not changed, match the character of the URL to be refreshed with the character corresponding to each node in the URL tree, obtain a node corresponding to the last character of the URL to be refreshed, search for the corresponding file according to the cache information of the file stored in the node corresponding to the last character of the URL to be refreshed, and refresh the file.
The URL refresh module 608 is configured to match the first character of the URL to be refreshed with the root node of the URL tree: if the current character of the URL to be refreshed is the same as the first character in the unmatched character corresponding to the current node, the next character is updated to be the current character, if the characters corresponding to the current node are completely matched, the child node of the current node is selected to be updated to be the current node, and until all the characters of the URL to be refreshed are completely matched.
In one embodiment, the caching apparatus 60 may further include:
the directory refreshing module 610 is configured to perform reverse order arrangement on each field of the domain name corresponding to the file to be refreshed, obtain the domain name to be refreshed, match the characters of the domain name to be refreshed with the characters corresponding to each node in the URL tree, obtain a node corresponding to the last character of the domain name to be refreshed, find each corresponding file according to the node corresponding to the last character of the domain name to be refreshed and the cache information of the file stored in each child node of the node, and refresh each file.
The directory refresh module 610 is configured to match the first character of the domain name to be refreshed with the root node of the URL tree: if the current character of the domain name to be refreshed is the same as the first character in the unmatched character corresponding to the current node, the next character is updated to be the current character, and if the characters corresponding to the current node are completely matched, the child node of the current node is selected to be updated to be the current node until all the characters of the domain name to be refreshed are completely matched.
In the above embodiments, the characters other than the separators are represented by ASCII codes, and the midpoints and oblique lines of the separators are represented by different preset values.
The invention also provides a computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out the steps of the method of any of the preceding embodiments.
As will be appreciated by one skilled in the art, embodiments of the present invention may be provided as a method, system, or computer program product. Accordingly, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present invention may take the form of a computer program product embodied on one or more computer-usable non-transitory storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
The present invention is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents, improvements and the like that fall within the spirit and principle of the present invention are intended to be included therein.

Claims (16)

1. A caching method, comprising:
carrying out reverse order arrangement on all fields of a domain name part of a Uniform Resource Locator (URL) corresponding to the cache file, wherein the sequence of the fields except the domain name part and all characters in all the fields is unchanged, and obtaining the URL to be inserted;
matching is carried out from the first character of the URL to be inserted and the root node of the URL tree:
if the current character of the URL to be inserted is the same as the first character in the unmatched character corresponding to the current node, updating the next character of the URL to be inserted into the current character of the URL to be inserted, and if all the characters corresponding to the current node are matched, selecting the child node of the current node to be updated into the current node;
if the current character of the URL to be inserted is different from the first character in the unmatched character corresponding to the current node, and the character which is different from the current character of the URL to be inserted is matched with the first character corresponding to the current node, inserting the current character of the URL to be inserted and all characters after the current byte as child nodes of a parent node of the current node;
if the current character of the URL to be inserted is different from the first character in the unmatched character corresponding to the current node, and the character which is matched with the current character of the URL to be inserted differently is not the first character corresponding to the current node, extracting the character before the character which is matched with the current character of the URL to be inserted differently in the current node as a new node to replace the current node, extracting the character which is matched with the current character of the URL to be inserted differently and the character after the character as a first child node of the new node to insert, inserting the current character of the URL to be inserted and all the characters after the current character of the URL to be inserted as a second child node of the new node, and inserting each path from each child node of the current node to each leaf node after the first child node;
and storing the cache information of the cache file corresponding to the URL to be inserted into the node corresponding to the last character of the URL to be inserted.
2. The method of claim 1, further comprising:
carrying out reverse order arrangement on all fields of a domain name part of a URL (Uniform resource locator) corresponding to a file to be refreshed, wherein the fields except the domain name part and the sequence of all characters in all the fields are unchanged, and obtaining the URL to be refreshed;
matching the characters of the URL to be refreshed with the characters corresponding to each node in the URL tree to obtain the node corresponding to the last character of the URL to be refreshed;
and searching the corresponding file according to the cache information of the file stored in the node corresponding to the last character of the URL to be refreshed, and refreshing the file.
3. The method of claim 1, further comprising:
carrying out reverse order arrangement on all fields of the domain name corresponding to the file to be refreshed, wherein the order of all characters in all the fields is unchanged, and obtaining the domain name to be refreshed;
matching the characters of the domain name to be refreshed with the characters corresponding to each node in the URL tree to obtain a node corresponding to the last character of the domain name to be refreshed;
and searching corresponding files according to the node corresponding to the last character of the domain name to be refreshed and the cache information of the files stored in the sub-nodes of the node, and refreshing the files.
4. The method of claim 2,
the matching of the characters of the URL to be refreshed and the characters corresponding to each node in the URL tree comprises the following steps:
matching is carried out from the first character of the URL to be refreshed and the root node of the URL tree:
if the current character of the URL to be refreshed is the same as the initial character in the unmatched character corresponding to the current node, updating the next character of the URL to be refreshed to the current character of the URL to be refreshed, and if all the characters corresponding to the current node are matched, selecting the child node of the current node to be updated to the current node until all the characters of the URL to be refreshed are matched.
5. The method of claim 3,
the matching of the characters of the domain name to be refreshed and the characters corresponding to each node in the URL tree comprises:
matching is carried out from the first character of the domain name to be refreshed and the root node of the URL tree:
if the current character of the domain name to be refreshed is the same as the first character in the unmatched character corresponding to the current node, the next character of the domain name to be refreshed is updated to the current character of the URL to be refreshed, if all the characters corresponding to the current node are matched, the child node of the current node is selected to be updated to the current node until all the characters of the domain name to be refreshed are matched.
6. The method of claim 1,
storing an initial value of a child node of the node and a child node index array of the node in each node in the URL tree, wherein the initial value of the child node is the minimum value of the corresponding first character of each child node, and the child node index array respectively stores the addresses of each child node in the memory;
the selecting the child node of the current node to update to the current node comprises:
subtracting the initial value of the child node corresponding to the current node from the current character of the URL to be inserted to obtain a child node index array value of the current node;
searching a corresponding child node index array according to the child node index array value;
and finding the corresponding child node according to the memory address in the child node index array, and updating the child node to be the current node.
7. The method according to any one of claims 1 to 6,
the characters except for the separators are represented by American Standard Code for Information Interchange (ASCII) codes, and the midpoints and oblique lines of the separators are represented by different preset values respectively.
8. A cache apparatus, comprising:
the URL conversion module is used for carrying out reverse order arrangement on all fields of a domain name part of a uniform resource locator URL corresponding to the cache file, and obtaining a URL to be inserted, wherein the fields except the domain name part and the sequence of all characters in all the fields are unchanged;
a URL insertion module, configured to perform matching starting from a first character of the URL to be inserted and a root node of a URL tree:
if the current character of the URL to be inserted is the same as the first character in the unmatched character corresponding to the current node, updating the next character of the URL to be inserted into the current character of the URL to be inserted, and if all the characters corresponding to the current node are matched, selecting the child node of the current node to be updated into the current node;
if the current character of the URL to be inserted is different from the first character in the unmatched character corresponding to the current node, and the character which is different from the current character of the URL to be inserted is matched with the first character corresponding to the current node, inserting the current character of the URL to be inserted and all characters after the current byte as child nodes of a parent node of the current node;
if the current character of the URL to be inserted is different from the first character in the unmatched character corresponding to the current node, and the character which is matched with the current character of the URL to be inserted differently is not the first character corresponding to the current node, extracting the character before the character which is matched with the current character of the URL to be inserted differently in the current node as a new node to replace the current node, extracting the character which is matched with the current character of the URL to be inserted differently and the character after the character as a first child node of the new node to insert, inserting the current character of the URL to be inserted and all the characters after the current character of the URL to be inserted as a second child node of the new node, and inserting each path from each child node of the current node to each leaf node after the first child node;
and the cache information storage module is used for storing the cache information of the cache file corresponding to the URL to be inserted into the node corresponding to the last character of the URL to be inserted.
9. The apparatus of claim 8, further comprising:
and the URL refreshing module is used for carrying out reverse order arrangement on all fields of a domain name part of the URL corresponding to the file to be refreshed, obtaining the URL to be refreshed, matching the characters of the URL to be refreshed with the characters corresponding to all nodes in the URL tree to obtain the node corresponding to the last character of the URL to be refreshed, searching the corresponding file according to the cache information of the file stored in the node corresponding to the last character of the URL to be refreshed, and refreshing the file.
10. The apparatus of claim 8, further comprising:
and the directory refreshing module is used for carrying out reverse order arrangement on all fields of the domain name corresponding to the file to be refreshed, obtaining the domain name to be refreshed, matching the characters of the domain name to be refreshed with the characters corresponding to all nodes in the URL tree, obtaining the node corresponding to the last character of the domain name to be refreshed, searching the corresponding files according to the node corresponding to the last character of the domain name to be refreshed and the cache information of the files stored in all sub-nodes of the node, and refreshing the files.
11. The apparatus of claim 9,
the URL refreshing module is used for matching the first character of the URL to be refreshed with the root node of the URL tree: if the current character of the URL to be refreshed is the same as the initial character in the unmatched character corresponding to the current node, updating the next character of the URL to be refreshed to the current character of the URL to be refreshed, and if all the characters corresponding to the current node are matched, selecting the child node of the current node to be updated to the current node until all the characters of the URL to be refreshed are matched.
12. The apparatus of claim 10,
the directory refreshing module is used for matching the first character of the domain name to be refreshed with the root node of the URL tree: if the current character of the domain name to be refreshed is the same as the first character in the unmatched character corresponding to the current node, the next character of the domain name to be refreshed is updated to the current character of the domain name to be refreshed, if all the characters corresponding to the current node are matched, the child node of the current node is selected to be updated to the current node until all the characters of the domain name to be refreshed are matched.
13. The apparatus of claim 8,
storing an initial value of a child node of the node and a child node index array of the node in each node in the URL tree, wherein the initial value of the child node is the minimum value of the corresponding first character of each child node, and the child node index array respectively stores the addresses of each child node in the memory;
the URL inserting module is used for subtracting the initial value of the child node corresponding to the current node from the current character of the URL to be inserted to obtain the child node index array value of the current node, finding the corresponding child node index array according to the child node index array value, finding the corresponding child node according to the memory address in the child node index array, and updating the child node to be the current node.
14. The apparatus according to any one of claims 8 to 13,
the characters except for the separators are represented by American Standard Code for Information Interchange (ASCII) codes, and the midpoints and oblique lines of the separators are represented by different preset values respectively.
15. A cache apparatus, comprising:
a memory; and
a processor coupled to the memory, the processor configured to perform the caching method of any one of claims 1-7 based on instructions stored in the memory device.
16. A computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out the steps of the method according to any one of claims 1 to 7.
CN201710689714.1A 2017-08-14 2017-08-14 Caching method and device and computer storage medium Active CN110020272B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710689714.1A CN110020272B (en) 2017-08-14 2017-08-14 Caching method and device and computer storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710689714.1A CN110020272B (en) 2017-08-14 2017-08-14 Caching method and device and computer storage medium

Publications (2)

Publication Number Publication Date
CN110020272A CN110020272A (en) 2019-07-16
CN110020272B true CN110020272B (en) 2021-11-05

Family

ID=67186064

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710689714.1A Active CN110020272B (en) 2017-08-14 2017-08-14 Caching method and device and computer storage medium

Country Status (1)

Country Link
CN (1) CN110020272B (en)

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112861030B (en) * 2019-11-27 2024-04-05 北京金山云网络技术有限公司 CDN refreshing method and device, cache server and storage medium
CN112861031B (en) * 2019-11-27 2024-04-02 北京金山云网络技术有限公司 URL refreshing method, device and equipment in CDN and CDN node
CN113271359A (en) * 2021-05-19 2021-08-17 北京百度网讯科技有限公司 Method and device for refreshing cache data, electronic equipment and storage medium
CN113836138A (en) * 2021-09-07 2021-12-24 海南太美航空股份有限公司 Flight data caching method and system

Family Cites Families (19)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101431539B (en) * 2008-12-11 2011-04-20 华为技术有限公司 Domain name resolution method, system and apparatus
US20120005185A1 (en) * 2010-06-30 2012-01-05 Cbs Interactive Inc. System and method for locating data feeds
CN101917494B (en) * 2010-09-09 2013-05-15 刁永平 Realization of autonomous Internet
US9361395B2 (en) * 2011-01-13 2016-06-07 Google Inc. System and method for providing offline access in a hosted document service
CN102663022B (en) * 2012-03-21 2015-02-11 浙江盘石信息技术有限公司 Classification recognition method based on URL (uniform resource locator)
CN102819586B (en) * 2012-07-31 2015-10-07 北京网康科技有限公司 A kind of URL sorting technique based on high-speed cache and equipment
KR102017526B1 (en) * 2012-09-25 2019-09-03 삼성전자주식회사 Method and apparatus for searching url address in url list in a communication system
US9754046B2 (en) * 2012-11-09 2017-09-05 Microsoft Technology Licensing, Llc Taxonomy driven commerce site
CN103870495B (en) * 2012-12-14 2017-04-12 阿里巴巴集团控股有限公司 Method and device for extracting information from website
CN103077208B (en) * 2012-12-28 2016-01-27 华为技术有限公司 URL(uniform resource locator) matched processing method and device
CN103383665B (en) * 2013-07-12 2016-04-27 北京奇虎科技有限公司 Be suitable in url data crawl the method for data buffer storage and device
WO2015036817A1 (en) * 2013-09-15 2015-03-19 Yogesh Chunilal Rathod Structured updated status, requests, user data & programming based presenting & accessing of connections
CN104767835B (en) * 2014-01-03 2019-05-31 上海携程商务有限公司 The configuration system and method for the address URL
EP3100442B1 (en) * 2014-01-31 2019-07-17 Telefonaktiebolaget LM Ericsson (publ) Method and apparatus for obtaining content from a media server
CN103927325B (en) * 2014-03-13 2017-06-27 中国联合网络通信集团有限公司 A kind of method and device classified to URL
CN104219327B (en) * 2014-09-27 2017-05-10 上海瀚之友信息技术服务有限公司 Distributed cache system
CN106686033A (en) * 2015-11-10 2017-05-17 中兴通讯股份有限公司 Method, device and system for cache and service content
CN105868251A (en) * 2015-12-22 2016-08-17 乐视云计算有限公司 Cache data updating method and device
CN107026758B (en) * 2017-04-14 2021-05-04 深信服科技股份有限公司 Information processing method, information processing system and server for CDN service update

Also Published As

Publication number Publication date
CN110020272A (en) 2019-07-16

Similar Documents

Publication Publication Date Title
CN109254733B (en) Method, device and system for storing data
CN110020272B (en) Caching method and device and computer storage medium
CN108846753B (en) Method and apparatus for processing data
CN111046034A (en) Method and system for managing memory data and maintaining data in memory
CN107704202B (en) Method and device for quickly reading and writing data
CN105373541A (en) Processing method and system for data operation request of database
CN107844488B (en) Data query method and device
CN105095425A (en) Cross-database transfer method and device for databases
CN110704194A (en) Method and system for managing memory data and maintaining data in memory
CN115858488A (en) Parallel migration method and device based on data governance and readable medium
EP3107010B1 (en) Data integration pipeline
CN111651424A (en) Data processing method and device, data node and storage medium
CN105930104B (en) Date storage method and device
CN103077174A (en) Concurrent resource updating method and equipment
CN107844483B (en) File management method and device
CN112434062A (en) Quasi-real-time data processing method, device, server and storage medium
CN109165259B (en) Index table updating method based on network attached storage, processor and storage device
CN110945506B (en) Searchable encryption supporting hybrid indexes
CN111046106A (en) Cache data synchronization method, device, equipment and medium
CN115454971A (en) Data migration method and device, electronic equipment and storage medium
CN113411364B (en) Resource acquisition method and device and server
CN110222105B (en) Data summarization processing method and device
US11921690B2 (en) Custom object paths for object storage management
CN113486025A (en) Data storage method, data query method and device
JP7293544B2 (en) Q&A system update processing method and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
TR01 Transfer of patent right

Effective date of registration: 20220127

Address after: 100007 room 205-32, floor 2, building 2, No. 1 and No. 3, qinglonghutong a, Dongcheng District, Beijing

Patentee after: Tianyiyun Technology Co.,Ltd.

Address before: No.31, Financial Street, Xicheng District, Beijing, 100033

Patentee before: CHINA TELECOM Corp.,Ltd.

TR01 Transfer of patent right