CN110597855A

CN110597855A - Data storage method, terminal equipment and computer readable storage medium

Info

Publication number: CN110597855A
Application number: CN201910749968.7A
Authority: CN
Inventors: 彭炯瑜; 解静仪; 农革
Original assignee: Sun Yat Sen University
Current assignee: Sun Yat Sen University
Priority date: 2019-08-14
Filing date: 2019-08-14
Publication date: 2019-12-20
Anticipated expiration: 2039-08-14
Also published as: CN110597855B

Abstract

The application is applicable to the technical field of data processing, and provides a data storage method, terminal equipment and a computer readable storage medium, which comprise the following steps: acquiring a search keyword, and searching a target linked list corresponding to the first character of the search keyword from a cache; detecting whether a target index matched with the search keyword exists in a target linked list or not; if the target linked list does not have a target index matched with the search keyword, searching and acquiring the target index matched with the search keyword from the disk; adding the target index into a target linked list in the cache, and outputting a suffix character string corresponding to the target index; if the target linked list has the target index matched with the search keyword, the suffix character string corresponding to the target index is obtained and output, so that the data query efficiency is improved, and the data query time is shortened.

Description

Data storage method, terminal equipment and computer readable storage medium

Technical Field

The present application belongs to the technical field of data processing, and in particular, to a data storage method, a terminal device, and a computer-readable storage medium.

Background

The search engine based on the suffix index is a search engine using the suffix character string of the character string as an index, and the existing search engine based on the suffix index usually stores the index in a disk, so that each time the terminal device processes a query request, all indexes need to be loaded into a memory from the disk, and then indexes meeting requirements are searched from the indexes loaded into the memory. However, since most of the indexes stored in the disk are rarely retrieved, loading all the indexes into the memory each time a query request is processed reduces data query efficiency and prolongs data query time.

Disclosure of Invention

In view of this, embodiments of the present application provide a data storage method, a terminal device, and a computer-readable storage medium, so as to solve the problems of low data query efficiency and long data query time caused by the existing data storage method based on suffix index.

A first aspect of an embodiment of the present application provides a data storage method, including:

obtaining a search keyword, and searching a target linked list corresponding to the first character of the search keyword from a cache; the target linked list is used for storing the information of indexes corresponding to suffix character strings taking the first character of the search keyword as the first character;

detecting whether a target index matched with the search keyword exists in the target linked list;

if the target linked list does not have a target index matched with the search keyword, searching and acquiring the target index matched with the search keyword from a disk;

adding the target index into the target linked list in the cache, and outputting a suffix character string corresponding to the target index;

and if the target linked list has a target index matched with the search keyword, acquiring and outputting a suffix character string corresponding to the target index.

Further, the indexed information includes a longest common prefix length, a source location identifier, a non-common character string, the number of times of retrieval, and an address of a subsequent index; the longest common prefix length is used for describing the length of the longest common prefix of the suffix character string corresponding to the index and the preceding suffix character string thereof, the source position identification is used for describing the position of the suffix character string corresponding to the index in a source file, the non-common character string is used for describing the non-common character string with preset length of the suffix character string corresponding to the index and the preceding suffix character string thereof, the searched times are used for describing the times of the index being searched, and the address of the succeeding index is used for describing the address of the succeeding index of the index in the target linked list;

the detecting whether a target index matched with the search keyword exists in the target linked list comprises the following steps:

detecting whether a first index in the target linked list is empty;

if the first index in the target linked list is not empty, acquiring a non-public character string of the first index, combining a character string variable with a predefined initial length of 0 and the non-public character string of the first index to obtain a suffix character string corresponding to the first index, and comparing the suffix character string corresponding to the first index with the search keyword;

if the suffix character string corresponding to the first index is equal to the search keyword, determining the first index as a target index matched with the search keyword;

acquiring the longest common prefix length of each subsequent index of the first index;

and if the longest common prefix length of the subsequent index of the first index is greater than or equal to the length of the search keyword, determining the subsequent index of the first index as a target index matched with the search keyword.

Further, after comparing the suffix character string corresponding to the first index with the search keyword, the method further includes:

if the suffix character string corresponding to the first index is smaller than the search keyword, acquiring the longest common prefix length and the non-common character string of each subsequent index of the first index, extracting n first characters from the suffix character string corresponding to each subsequent index of the subsequent indexes to obtain the character to be combined of each subsequent index, and combining the character to be combined of each subsequent index with the non-common character string of the subsequent index to obtain the suffix character string corresponding to each subsequent index; wherein n is the longest common prefix length of the successor index;

and sequentially comparing the suffix character strings corresponding to the subsequent indexes of the first index with the search keywords in size, and determining whether the subsequent indexes of the first index are target indexes matched with the search keywords based on the comparison result.

Further, if there is no target index matching with the search keyword in the target linked list, searching for the target index matching with the search keyword from a disk, including:

if the target linked list does not have a target index matched with the search keyword, determining an index to be searched from an index file stored in a disk based on a dichotomy, and acquiring a source position identifier of the index to be searched;

based on the source position identification of the index to be searched, acquiring a suffix character string corresponding to the index to be searched from a source file stored in the disk;

and if the suffix character string corresponding to the index to be searched contains the search keyword, determining the index to be searched as a target index matched with the search keyword.

Further, the adding the target index to the target linked list in the cache includes:

determining the residual storage capacity of the target linked list;

if the total data volume of the target index is less than or equal to the residual storage volume of the target linked list, adding the target index into the target linked list in the cache;

if the total data volume of the target index is larger than the residual storage volume of the target linked list, reducing the storage space of the linked list except the target linked list in the cache, expanding the storage space of the target linked list until the residual storage volume of the target linked list is larger than or equal to the total data volume of the target index, and adding the target index into the expanded target linked list.

Further, the reducing the storage space of the linked list in the cache except for the target linked list and expanding the storage space of the target linked list includes:

determining the linked lists to be reduced, of which the stored data amount is less than half of the storable data amount, from the other linked lists except the target linked list;

and reducing the storage space of the linked list to be reduced by half, and expanding the storage space of the target linked list based on the reduced storage space of the linked list to be reduced until the residual storage capacity of the target linked list is greater than or equal to the total data volume of the target index.

determining a target node to be stored with the target index based on the target index, the longest common prefix length of each index in the target linked list and a non-common character string;

and storing the target index into the target node in the target linked list.

A second aspect of an embodiment of the present application provides a terminal device, including:

the first searching unit is used for acquiring a search keyword and searching a target linked list corresponding to the first character of the search keyword from a cache; the target linked list is used for storing the information of indexes corresponding to suffix character strings taking the first character of the search keyword as the first character;

the first detection unit is used for detecting whether a target index matched with the search keyword exists in the target linked list or not;

the second searching unit is used for searching and acquiring a target index matched with the search keyword from a disk if the target index matched with the search keyword does not exist in the target linked list;

the data adding unit is used for adding the target index into the target linked list in the cache and outputting a suffix character string corresponding to the target index;

and the data output unit is used for acquiring and outputting a suffix character string corresponding to the target index if the target index matched with the search keyword exists in the target linked list.

A third aspect of embodiments of the present application provides a terminal device, including a memory, a processor, and a computer program stored in the memory and executable on the processor, where the processor implements the steps of the first aspect when executing the computer program.

A fourth aspect of embodiments of the present application provides a computer-readable storage medium having stored thereon a computer program which, when executed by a processor, performs the steps of the first aspect.

A fifth aspect of embodiments of the present application provides a computer program product, which, when run on a terminal device, causes the terminal device to execute the data storage method of any one of the first aspect.

The data storage method and the data storage equipment provided by the embodiment of the application have the following beneficial effects:

according to the data storage method provided by the embodiment of the application, after the search keyword is obtained, an index matched with the search keyword is searched from a cache; when the target index matched with the search keyword exists in the cache, the suffix character string corresponding to the target index is directly obtained and output from the cache, and data does not need to be searched from a disk under the condition, so that the data query time is shortened, and the data query efficiency is improved; when the target index matched with the search keyword does not exist in the cache, the target index matched with the search keyword is obtained from the disk, and the target index is added into the cache, so that the target index can be directly searched from the cache in the following process.

Drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings needed to be used in the embodiments or the prior art descriptions will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without inventive exercise.

Fig. 1 is a flowchart of an implementation of a data storage method according to a first embodiment of the present application;

fig. 2 is a flowchart illustrating a specific implementation of S3 in a data storage method according to a second embodiment of the present application;

fig. 3 is a flowchart illustrating a specific implementation of S2 in a data storage method according to a third embodiment of the present application;

fig. 4 is a flowchart illustrating a specific implementation of S4 in a data storage method according to a fourth embodiment of the present application;

fig. 5 is a block diagram of a terminal device according to an embodiment of the present application;

fig. 6 is a schematic diagram of a terminal device according to another embodiment of the present application.

Detailed Description

In order to make the objects, technical solutions and advantages of the present application more apparent, the present application is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the present application and are not intended to limit the present application.

Referring to fig. 1, fig. 1 is a flowchart illustrating an implementation of a data storage method according to a first embodiment of the present application. In this embodiment, the main execution body of the process is a terminal device. The terminal devices include but are not limited to: the device comprises a server, a computer, a smart phone, a tablet computer and the like, wherein the server, the computer, the smart phone, the tablet computer and the like can execute data storage operation. The data storage method as shown in fig. 1 includes the steps of:

s1: obtaining a search keyword, and searching a target linked list corresponding to the first character of the search keyword from a cache; the target linked list is used for storing the information of the index corresponding to the suffix character string taking the first character of the search keyword as the first character.

When a user needs to search information related to a certain keyword through a search engine, the user can input the search keyword in a search box of the search engine, the terminal device obtains the search keyword input by the user in the search box, and a target linked list corresponding to the first character of the search keyword is searched in a cache of the terminal device.

It should be noted that the search engine in the embodiment of the present application is a search engine based on suffix index, and a search engine based on suffix index refers to a search engine indexed by a suffix character string of a character string. A suffix string of a string refers to a sub-string that starts at a certain position i of the string and ends at the end j of the string, where i is an integer less than or equal to j, which is the total number of characters of the string. And a character string array formed by sequencing all suffix character strings of the character strings according to the dictionary sequence is the suffix array of the character strings. For example, assuming that a string is mississipp $, the string includes 11 suffix strings, which are as follows: mississipp $, issssipp $, ssissipp $, issippp $, ssipp $, sipp $, ipp $, pp $, p $, and the suffix array of the string is:

in the embodiment of the application, the search engine based on the suffix index is used for searching the character string containing the search key words from the source file. The source file comprises at least one character string, and the source file is stored in a magnetic disk of the terminal equipment. In practical application, in order to facilitate data search, an index file of a source file is usually established in a disk, and the index file is used for storing information of indexes corresponding to suffix character strings of all character strings in the source file. Sequencing the indexes in the index file according to the dictionary sequence of the suffix character strings corresponding to the indexes, wherein each index in the index file comprises: the longest common prefix length, the source position identification and the non-common character string.

The longest common prefix length is used for describing the length of the longest common prefix of a suffix character string corresponding to an index and a preceding suffix character string thereof, and the preceding suffix character string of the suffix character string refers to a suffix character string adjacent to the suffix character string and arranged before the suffix character string in dictionary order. For example, if two adjacent subsequent character strings arranged in the dictionary order are ssip and ssippp, respectively, the preceding subsequent character string of the suffix character string ssippp is ssip.

The source location identification source location is used for describing the location of the suffix character string corresponding to the index in the source file, and particularly is used for describing the arrangement sequence of the first character of the suffix character string corresponding to the index in the source file, which can be represented by a numerical sequence number. For example, assuming that a character string in the source file is missing ssipp $, since the first character of one suffix character string of the character string is arranged in the source file in 5, the position of the suffix character string ssipp $ in the source file is 5.

The non-common character string is used for describing the non-common character string with preset length of the suffix character string corresponding to the index and the preceding suffix character string. Wherein non-common characters of a certain suffix string and its preceding suffix string are exchanged for the remaining strings of the suffix string except for the common prefix of the suffix string and its preceding suffix string. The preset length can be set according to actual requirements, and is not limited here. Illustratively, if some two adjacent subsequent strings arranged in the dictionary order are respectively ssip and ssispp, the common prefix of the suffix string ssipp and its preceding suffix string ssip is ssi, the non-common string of the suffix string ssipp and its preceding suffix string ssip is ssipp, and assuming that the preset length is 4, the non-common string of the preset length of the suffix string ssipp and its preceding suffix string is ssip.

For example, assuming that only one string missing app $isincluded in the source file, the index file of the source file is as shown in table 1 below. Where the ordinal numbers in Table 1 are used to describe the lexicographic ordering of all suffix strings of that string.

TABLE 1

In the embodiment of the application, the cache is composed of an array with the length of N and N linked lists. Specifically, the character corresponding to each element in the array is an identifier of the linked list corresponding to the element, that is, an initial character of a suffix character string corresponding to all indexes stored in the linked list corresponding to the element. For example, if the predefined code character set is an english character set, since the english character set includes 26 english characters a-z, the cache is composed of an array with a length of 26 and 26 linked lists. Wherein, 26 elements in the array respectively correspond to 26 English characters from a to z; and in the element corresponding to the character a, the first characters of the suffix character strings corresponding to all indexes stored in the corresponding linked list are a, and so on, and in the element corresponding to the character z, the first characters of the suffix character strings corresponding to all indexes stored in the corresponding linked list are z.

Each element of the array stores the storage address of the head node of the linked list corresponding to the element in the cache, the storable data quantity of the linked list corresponding to the element, the stored data quantity of the linked list corresponding to the element and the identifier of the linked list corresponding to the element. The data storage quantity of the linked list corresponding to a certain element is used for describing the index quantity which can be stored in the linked list corresponding to the element; the stored data volume of the linked list corresponding to a certain element is used for describing the number of the indexes stored in the linked list corresponding to the element; the identifier of the linked list corresponding to a certain element is the character corresponding to the element, for example, if the character corresponding to a certain element in the array is a, the identifier of the linked list corresponding to the element is a.

The linked list is used for storing the information of the index corresponding to the suffix character string with the identifier of the linked list as the initial character. One node in the linked list is used to store information for one index. In the embodiment of the present application, the information of the index stored in the linked list includes: long public prefix length, source location identification, non-public character string, number of times retrieved, and address of subsequent index. Wherein the number of times the index is retrieved is used to describe the number of times the index is retrieved. The address of the subsequent index of the index is used to describe the address of the subsequent index of the index in the target linked list, and it should be noted that the subsequent index of a certain index refers to the index in the next node of the node corresponding to the index.

In the embodiment of the application, after the terminal device acquires the search keyword, the first character of the search keyword is determined, a target element corresponding to the first character of the search keyword is determined from an array in a cache based on the first character of the search keyword, a storage address of a table head node of a target linked list corresponding to the target element in the cache is acquired from the target element, and a target linked list corresponding to the first character of the search keyword is searched based on the storage address of the table head node of the target linked list in the cache, that is, the target linked list is used for storing information of an index corresponding to a suffix character string with the first character of the search keyword as the first character.

S2: and detecting whether a target index matched with the search keyword exists in the target linked list.

In the embodiment of the application, the terminal device detects whether the target index matched with the search keyword exists in the target linked list by detecting whether the suffix character string corresponding to each index in the target linked list contains the search keyword.

Specifically, if the terminal device detects that a suffix character string corresponding to an index in the target linked list contains a search keyword, determining that the index matches the search keyword, that is, determining that a target index matching the search keyword exists in the target linked list, determining, by the terminal device, the index matching the search keyword as the target index, and executing S5; if the terminal device detects that suffix character strings corresponding to all indexes in the target linked list do not contain the search keywords, determining that all indexes in the target linked list are not matched with the search keywords, namely determining that the target linked list does not contain the target indexes matched with the search keywords, and executing S3-S4 by the terminal device.

S3: and if the target index matched with the search keyword does not exist in the target linked list, searching and acquiring the target index matched with the search keyword from a disk.

In the embodiment of the application, when the terminal device detects that the target index matched with the search keyword does not exist in the target linked list, the terminal device indicates that the target index matched with the search keyword has not been retrieved, and at this time, the terminal device searches the target index matched with the search keyword from the disk.

As an embodiment of the present application, S3 can be implemented by S31 to S23 as shown in fig. 2, which are detailed as follows:

s31: if the target linked list does not have a target index matched with the search keyword, determining an index to be searched from an index file stored in a disk based on a dichotomy, and acquiring a source position identifier of the index to be searched.

In this embodiment, if it is detected that a target index matching the search keyword does not exist in the target linked list, the terminal device determines an index to be searched from an index file in the disk based on the bisection method, and obtains a source location identifier of the index to be searched from the index file. The index to be searched refers to an index which is determined based on the dichotomy and is located in the middle position of each search threshold, and the search threshold refers to the search range in the index file.

S32: and acquiring a suffix character string corresponding to the index to be searched from the source file stored in the disk based on the source position identification of the index to be searched.

And after the terminal equipment acquires the suffix character string corresponding to the index to be searched, comparing the suffix character string corresponding to the index to be searched with the search keyword. If the terminal device detects that the suffix character string corresponding to the index to be searched includes the search keyword, S33 is executed.

S33: and if the suffix character string corresponding to the index to be searched contains the search keyword, determining the index to be searched as a target index matched with the search keyword.

In this embodiment, if the terminal device detects that the suffix character string corresponding to the index to be searched includes the search keyword, the index to be searched is determined as the target index matched with the search keyword, and S4 is executed, that is, the target index is added to the target linked list in the cache.

S4: and adding the target index into the target linked list in the cache, and outputting a suffix character string corresponding to the target index.

In the embodiment of the application, after the terminal device acquires the target index matched with the search keyword from the disk, the acquired target index is inserted into the corresponding node in the target linked list, so that the indexes in the target linked list after the target index is inserted are still arranged according to the dictionary sequence of the suffix character strings corresponding to the indexes, and the server outputs the suffix character strings corresponding to the target index.

S5: and if the target linked list has a target index matched with the search keyword, acquiring and outputting a suffix character string corresponding to the target index.

In the embodiment of the application, if the terminal device detects that the target index matched with the search keyword exists in the target linked list, the terminal device obtains and outputs the suffix character string corresponding to the target index. The specific steps of the terminal device for acquiring the suffix character string corresponding to the target index are as follows: the longest common prefix length and the non-common character string of the target index are obtained, the character to be combined of the target index is determined based on the longest common prefix length of the target index and the suffix character string corresponding to the forward index of the target index, and the character to be combined of the target index and the non-common character string of the target index are combined to obtain the suffix character string of the target index. A successor index to a target index refers to an index that is adjacent to the target index and that precedes the target index in lexicographic order.

As can be seen from the above, according to the data storage method provided by the embodiment of the present application, after the search keyword is obtained, an index matched with the search keyword is searched from a cache; when the target index matched with the search keyword exists in the cache, the suffix character string corresponding to the target index is directly obtained and output from the cache, and data does not need to be searched from a disk under the condition, so that the data query time is shortened, and the data query efficiency is improved; when the target index matched with the search keyword does not exist in the cache, the target index matched with the search keyword is obtained from the disk, and the target index is added into the cache, so that the target index can be directly searched from the cache in the following process.

Referring to fig. 3, fig. 3 is a flowchart illustrating an implementation of S2 in a data storage method according to a second embodiment of the present application. As shown in fig. 3, with respect to the embodiment described in fig. 1, S2 in the data storage method provided in this embodiment specifically includes S21 to S27, which are detailed as follows:

s21: and detecting whether the first index in the target linked list is empty.

In this embodiment, the first index of the target linked list is an index stored in the first node of the target linked list, and the first node of the target linked list is a node subsequent to the head node of the target linked list. The terminal equipment obtains a storage address of a head node of the target linked list in the cache from an element which is contained in an array in the cache and corresponds to the target linked list, searches the head node of the target linked list based on the storage address of the head node of the target linked list in the cache, obtains an address of a first index of the target linked list from the head node of the target linked list, positions a first index in the target linked list based on the address of the first index of the target linked list, and detects whether the first index in the target linked list is empty or not.

S22: if the first index in the target linked list is not empty, acquiring the non-public character string of the first index, combining a predefined character string variable with the initial length of 0 and the non-public character string of the first index to obtain a suffix character string corresponding to the first index, and comparing the suffix character string corresponding to the first index with the search keyword.

In this embodiment, the terminal device may pre-define a string variable preString, and define the initial length of the string variable preString to be 0.

The method comprises the steps that when the terminal detects that a first index in a target linked list is not empty, the longest common prefix length and a non-common character string of the first index are obtained, a character string variable preString with the initial length being 0 defined in advance and the non-common character string of the first index are combined to obtain a suffix character string corresponding to the first index, and the suffix character string corresponding to the first index is compared with search keywords. Wherein, the combination of the string variable preString and the non-common string of the first index is: and splicing the two character strings according to the sequence that the character string variable preString is before and the non-public character string of the first index is after. For example, assuming that the search key is ssi, the information of the index stored in the target linked list is shown in table 2 below:

TABLE 2

Since the non-common character string of the first index in table 2 is dpp, the terminal device combines the character string variable preString with the initial length of 0 and the non-common character string dpp to obtain a suffix character string corresponding to the first index, which is dpp, and the terminal device compares the suffix character string dpp corresponding to the first index with the search keyword ssi.

If the terminal equipment detects that the suffix character string corresponding to the first index is equal to the search keyword, executing S23-S25; if the terminal equipment detects that the suffix character string corresponding to the first index is smaller than the search keyword, executing S26-S27; if the terminal device detects that the suffix character string corresponding to the first index is larger than the search keyword, it is determined that the target index matched with the search keyword does not exist in the target linked list, and at this time, the terminal device executes S3.

It should be noted that two equal character strings mean that the two identical character strings are identical; the first character string is larger than the second character string, which means that the first character string is arranged after the second character string when the first character string is arranged based on the dictionary order; the first character string being smaller than the second character string means that the first character string is arranged before the second character string when arranged in a lexicographic order.

S23: and if the suffix character string corresponding to the first index is equal to the search keyword, determining the first index as a target index matched with the search keyword.

In this embodiment, when detecting that the suffix character string corresponding to the first index is equal to the search keyword, the terminal device determines the first index as a target index that matches the search keyword, that is, determines that a target index that matches the search keyword exists in the target linked list, at this time, the terminal device performs S5, that is, obtains and outputs the suffix character string corresponding to the target index, and at the same time, the terminal device continues to perform S24.

S24: and acquiring the longest common prefix length of each subsequent index of the first index.

Since each index in the target linked list is ordered according to the dictionary sequence of the suffix character string corresponding to each index, the suffix character string corresponding to the index arranged after is larger than the suffix character string corresponding to the index arranged before, when the suffix character string corresponding to a certain index in the target linked list contains a search keyword and the longest common prefix length of the successor index of the index is larger than or equal to the length of the search keyword, the suffix character string corresponding to the successor index of the index also contains the search keyword.

Based on this, in this embodiment, after determining that the first index is the target index matched with the search keyword, the terminal device sequentially obtains the longest common prefix length of each subsequent index of the first index. The subsequent index of the first index refers to each index arranged after the first index according to the dictionary order, and the subsequent index of the first index may be one index or multiple indexes, which is determined according to the actual situation. After the terminal equipment sequentially acquires the subsequent indexes of the first index, comparing the longest public prefix length of the subsequent indexes with the length of the search keyword, and when the longest public prefix length of a certain subsequent index of the first index is smaller than the length of the search keyword, indicating that the character string corresponding to the subsequent index does not contain the search keyword; when the longest common prefix length of a subsequent index of the first index is greater than or equal to the length of the search keyword, it indicates that the character string corresponding to the subsequent index contains the search keyword, and then the terminal device executes S25.

S25: and if the longest common prefix length of the subsequent index of the first index is greater than or equal to the length of the search keyword, determining the subsequent index of the first index as a target index matched with the search keyword.

In this embodiment, when detecting that the longest common prefix length of a certain subsequent index of the first index is greater than or equal to the length of the search keyword, the terminal device determines the subsequent index as a target index matched with the search keyword, and at the same time, the terminal device further executes S5, that is, obtains and outputs a suffix character string corresponding to the target index.

As another embodiment of the present application, when detecting that the longest common prefix length of a certain subsequent index of the first index is smaller than the length of the search keyword, the terminal device indicates that the subsequent index of the first index is not matched with the search keyword, at this time, the target index only includes the first index, and the terminal device outputs a suffix character string corresponding to the first index.

S26: if the suffix character string corresponding to the first index is smaller than the search keyword, acquiring the longest common prefix length and the non-common character string of each subsequent index of the first index, extracting n first characters from the suffix character string corresponding to each subsequent index of the subsequent indexes to obtain the character to be combined of each subsequent index, and combining the character to be combined of each subsequent index with the non-common character string of the subsequent index to obtain the suffix character string corresponding to each subsequent index; wherein n is the longest common prefix length of the successor index.

In this embodiment, when detecting that the suffix character string corresponding to the first index is smaller than the search keyword, the terminal device indicates that the first index is not matched with the search keyword, and at this time, the terminal device continues to detect whether a subsequent index of the first index is matched with the search keyword. It should be noted that, in this embodiment of the application, the subsequent index of a certain index refers to all indexes arranged after the certain index according to the dictionary order, and the subsequent index of a certain index may be one or multiple, which is determined specifically according to the actual situation, and is not limited here.

Specifically, the terminal device obtains the longest common prefix length and the non-common character string of each subsequent index of the first index, extracts the first n characters from the suffix character string corresponding to the preceding index of each subsequent index to obtain the character to be combined of each subsequent index, and combines the character to be combined of each subsequent index with the non-common character string of the subsequent index to obtain the suffix character string corresponding to each subsequent index. Where n is the longest common prefix length for each subsequent index. It should be noted that, in the embodiment of the present application, a subsequent index of a certain index refers to an index adjacent to the certain index and arranged before the certain index in the dictionary order, and there is only one subsequent index of each index.

For example, as shown in table 2, since the suffix string corresponding to the first index is sip, and the longest common prefix length of the second index is 2, the first 2 characters si are extracted from the suffix string corresponding to the first index, where si is the character to be combined of the second index, and the character to be combined of the second index si is combined with the non-common character string of the second index, so that the suffix string corresponding to the second index is ssisp. Since the suffix character string corresponding to the second index is ssip and the longest common prefix length of the third index is 1, the first 1 character s is extracted from the suffix character string corresponding to the second index, s is the character to be combined of the third index, the character to be combined of the third index is combined with the non-common character string sip of the third index to obtain the suffix character string corresponding to the third index as ssipp, and so on, the suffix character strings of each subsequent index of the first index can be obtained.

S27: and sequentially comparing the suffix character strings corresponding to the subsequent indexes of the first index with the search keywords in size, and determining whether the subsequent indexes of the first index are target indexes matched with the search keywords based on the comparison result.

In this embodiment, after acquiring the suffix character strings of each subsequent index of the first index, the terminal device sequentially compares the suffix character strings corresponding to each subsequent index with the search keywords.

If the terminal device detects that a suffix character string corresponding to a certain subsequent index is equal to a search keyword, the terminal device determines that the subsequent index is a target index matched with the search keyword, the terminal device acquires and outputs the suffix character string corresponding to the subsequent index, meanwhile, the terminal device compares the longest common prefix length of each subsequent index of the subsequent index with the length of the search keyword, if the longest common prefix length of the subsequent index is greater than or equal to the length of the search keyword, it is indicated that the suffix character string corresponding to the subsequent index of the subsequent index also contains the search keyword, and the subsequent index of the subsequent index is determined as the target index matched with the search keyword.

And if the terminal equipment detects that the suffix character string corresponding to a certain subsequent index is larger than the search keyword, determining that no target index matched with the search keyword exists in the subsequent index of the first index.

If the terminal equipment detects that the suffix character string corresponding to a certain subsequent index is smaller than the search keyword, the terminal equipment continuously compares the subsequent index of the subsequent index with the search keyword, determines whether each subsequent index is matched with the search keyword or not based on the comparison result, and the steps are repeated in this way, so that all target indexes matched with the search keyword in the target linked list can be found.

As can be seen from the above, when it is determined that a certain index matches a search keyword, the data storage method provided in this embodiment determines whether the subsequent index of the index matches the search keyword by comparing the longest common prefix length of the subsequent index of the index with the length of the search keyword, so that it is not necessary to compare suffix character strings corresponding to each index with the search keyword, thereby saving data retrieval time and improving data retrieval efficiency.

Referring to fig. 4, fig. 4 is a flowchart illustrating an implementation of S4 in a data storage method according to a fourth embodiment of the present application. As shown in fig. 4, with respect to the embodiment described in fig. 1, S4 in the data storage method provided in this embodiment specifically includes S41 to S43, which are detailed as follows:

s41: and determining the residual storage capacity of the target linked list.

In the embodiment of the application, after the terminal device acquires the target index from the disk, before the target index is added to the target linked list in the cache, the terminal device acquires the data storage capacity and the stored data capacity of the target linked list from the elements corresponding to the target linked list in the array in the cache, and determines the residual storage capacity of the target linked list based on the data storage capacity and the stored data capacity of the target linked list.

And after determining the residual storage capacity of the target linked list, the terminal equipment compares the total data volume of the target index acquired from the disk with the residual storage capacity of the target linked list. If the terminal detects that the total data volume of the target index is less than or equal to the residual storage volume of the target linked list, S42 is executed; if the terminal detects that the total data volume of the target index is larger than the residual storage volume of the target linked list, S43 is executed.

S42: and if the total data volume of the target index is less than or equal to the residual storage volume of the target linked list, adding the target index into the target linked list in the cache.

In this implementation, if the terminal detects that the total data size of the target index is less than or equal to the remaining storage amount of the target linked list, it indicates that the remaining space of the target linked list is sufficient to store the target index, and at this time, the terminal device directly adds the target index to the target linked list in the cache.

S43: if the total data volume of the target index is larger than the residual storage volume of the target linked list, reducing the storage space of the linked list except the target linked list in the cache, expanding the storage space of the target linked list until the residual storage volume of the target linked list is larger than or equal to the total data volume of the target index, and adding the target index into the expanded target linked list.

In this embodiment, if the terminal device detects that the total data size of the target index is greater than the remaining storage space of the target linked list, it indicates that the remaining space of the target linked list is insufficient to store the target index, at this time, the terminal device reduces the storage space of other linked lists except the target linked list in the cache, and expands the storage space of the target linked list until the remaining storage space of the target linked list is expanded to be greater than or equal to the total data size of the target index, and the target index is added to the expanded target linked list.

As an embodiment of the present application, reducing the storage space of the linked list in the cache except for the target linked list, and expanding the storage space of the target linked list may specifically include the following steps:

In this embodiment, when detecting that the total data size of the target index is greater than the remaining storage size of the target linked list, the terminal device obtains the storable data size and the stored data size of each of the other linked lists except the target linked list, determines the to-be-reduced linked list whose stored data size is less than half of the storable data size from all the other linked lists except the target linked list, reduces the storage space of the to-be-reduced linked list by half, and expands the storage space of the target linked list based on the reduced storage space of the to-be-reduced linked list, that is, how much storage space the to-be-reduced linked list expands the target linked list by how much until the remaining storage size of the expanded target linked list is greater than or equal to the total data size of the target index.

As an embodiment of the present invention, if the storage spaces of all to-be-reduced linked lists are reduced, and the storage spaces of the target linked lists are correspondingly expanded based on the storage spaces reduced by all to-be-reduced linked lists, and then the remaining storage amount of the target linked lists is still less than the total data amount of the target index, the terminal device needs to delete some indexes from each linked list, reduce the storage space of the linked list from which the indexes are deleted based on the total number of the deleted indexes, and expand the storage space of the target linked list until the sum of the total number of the deleted indexes and the remaining storage amount of the expanded target linked list is greater than or equal to the total data amount of the target index.

As a possible implementation manner of this embodiment, the terminal device may delete, from each linked list, an index whose number of times of being retrieved is smaller than the threshold of the number of times of being retrieved corresponding to the linked list. The threshold of the number of retrieved times corresponding to each linked list may be determined by the maximum value and the minimum value of the number of retrieved times stored in the linked list, specifically, the threshold of the number of retrieved times of a certain linked list may be (max + min)/2, where max is the maximum value of the number of retrieved times stored in the linked list, and min is the minimum value of the number of retrieved times stored in the linked list.

As another possible implementation manner of this embodiment, the information of the index stored in the linked list further includes the retrieved time of the index. Based on this, the terminal device may delete the index in the linked list whose time interval between the retrieved time and the current time is greater than the preset time length. The preset duration can be set according to actual requirements.

It should be noted that, since the order of some remaining indexes in the linked list changes after some indexes are deleted from the linked list, that is, the indexes of the linked list that follow and the indexes of some remaining indexes change, after some indexes are deleted from the linked list, the longest common prefix length and the non-common character string of the remaining indexes in the linked list are updated.

As an embodiment of the present application, the adding the target index to the expanded target linked list may specifically include the following steps:

and storing the target index into the target node in the target linked list.

In this embodiment, when the terminal device inserts the target index into the target linked list, the suffix character string corresponding to each index in the target linked list is determined based on the longest common prefix length and the non-common character string of each index in the target linked list. Specifically, the terminal device may pre-define a string variable with an initial length of 0, and combine the string variable with the initial length of 0 with the non-common string of the first index to obtain a suffix string corresponding to the first index in the target linked list. For each subsequent index of the first index, the terminal device may extract the first n characters from the suffix character string corresponding to the preceding index of each subsequent index to obtain the character to be combined of each subsequent index, and combine the character to be combined of each subsequent index with the non-common character string of the subsequent index to obtain the suffix character string corresponding to each subsequent index. Where n is the longest common prefix length for each subsequent index.

After the terminal equipment obtains the suffix character string corresponding to each index in the target linked list, the suffix character string corresponding to the target index is sequentially compared with the suffix character strings corresponding to the indexes stored in the target linked list, when the suffix character string corresponding to the target index is larger than the suffix character string corresponding to a certain index in the target linked list and smaller than the suffix character string corresponding to the successor index of the index, the terminal equipment determines the successor node of the node for storing the index as the target node of the target index to be stored, the index originally stored in the target node is moved backwards by one node for storage, and the target index is stored in the target node.

It should be noted that, in the embodiment of the present application, the successor index of a certain index refers to the index adjacent to and arranged after the certain index in the dictionary order, and there is only one successor index of each index.

As can be seen from the above, according to the data storage method provided by this embodiment, when there is not enough storage space in the target linked list to store the target index acquired from the disk, the storage space of other linked lists in the cache is reduced, and the storage space of the target linked list is expanded, so that all the target indexes can be stored in the target linked list in the cache, and the target index can be directly searched from the cache in the subsequent retrieval, thereby improving the data query efficiency and shortening the data query time.

Referring to fig. 5, fig. 5 is a block diagram of a terminal device according to an embodiment of the present disclosure. The terminal device in this embodiment may be a server, a computer, a smart phone, a tablet computer, or other devices capable of executing data storage operations. The terminal device includes units for executing the steps in the embodiments corresponding to fig. 1 to 4. Please refer to fig. 1 to 4 and fig. 1 to 4 for the corresponding embodiments. For convenience of explanation, only the portions related to the present embodiment are shown. Referring to fig. 5, the terminal apparatus 500 includes: a first search unit 51, a first detection unit 52, a second search unit 53, a data addition unit 54, and a data output unit 55. Wherein:

the first searching unit 51 is configured to obtain a search keyword, and search a target linked list corresponding to an initial character of the search keyword from a cache; the target linked list is used for storing the information of the index corresponding to the suffix character string taking the first character of the search keyword as the first character.

The first detecting unit 52 is configured to detect whether a target index matching the search key exists in the target linked list.

The second searching unit 53 is configured to search and acquire a target index matching the search keyword from a disk if the target index matching the search keyword does not exist in the target linked list.

The data adding unit 54 is configured to add the target index to the target linked list in the cache, and output a suffix character string corresponding to the target index.

The data output unit 55 is configured to, if a target index matching the search keyword exists in the target linked list, obtain and output a suffix character string corresponding to the target index.

As an embodiment of the present application, the indexed information includes a longest common prefix length, a source location identifier, a non-common character string, a number of times of retrieval, and an address of a subsequent index; the longest common prefix length is used for describing the length of the longest common prefix of the suffix character string corresponding to the index and the preceding suffix character string thereof, the source position identifier is used for describing the position of the suffix character string corresponding to the index in a source file, the non-common character string is used for describing the non-common character string of the preset length of the suffix character string corresponding to the index and the preceding suffix character string thereof, the searched times are used for describing the times of the index being searched, and the address of the succeeding index is used for describing the address of the succeeding index of the index in the target linked list.

The first detection unit 52 specifically includes: the device comprises an index detection unit, a first comparison unit, a first determination unit, a first acquisition unit and a second determination unit. Wherein:

the index detection unit is used for detecting whether a first index in the target linked list is empty or not.

The first comparing unit is configured to, if a first index in the target linked list is not empty, obtain a non-common character string of the first index, combine a predefined character string variable with an initial length of 0 with the non-common character string of the first index to obtain a suffix character string corresponding to the first index, and compare the suffix character string corresponding to the first index with the search keyword.

The first determining unit is used for determining the first index as a target index matched with the search keyword if the suffix character string corresponding to the first index is equal to the search keyword.

The first obtaining unit is configured to obtain a longest common prefix length of each subsequent index of the first index.

The second determining unit is configured to determine the subsequent index of the first index as the target index matching the search keyword if the longest common prefix length of the subsequent index of the first index is greater than or equal to the length of the search keyword.

As an embodiment of the present application, the first detecting unit 52 further includes: a character string combination unit and a third determination unit. Wherein:

the character string combination unit is used for acquiring the longest common prefix length and the non-common character string of each subsequent index of the first index if the suffix character string corresponding to the first index is smaller than the search keyword, extracting the first n characters from the suffix character string corresponding to the preceding index of each subsequent index to obtain the character to be combined of each subsequent index, and combining the character to be combined of each subsequent index with the non-common character string of the subsequent index to obtain the suffix character string corresponding to each subsequent index; wherein n is the longest common prefix length of the successor index.

The third determining unit is used for sequentially comparing the suffix character strings corresponding to the successor indexes of the first index with the search keywords in size, and determining whether the successor indexes of the first index are target indexes matched with the search keywords based on comparison results.

As an embodiment of the present application, the second searching unit 53 specifically includes: the device comprises an index searching unit, a suffix character string searching unit and a target index determining unit. Wherein:

and the index searching unit is used for determining an index to be searched from an index file stored in a disk based on a dichotomy and acquiring a source position identifier of the index to be searched if the target index matched with the search keyword does not exist in the target linked list.

And the suffix character string searching unit is used for acquiring a suffix character string corresponding to the index to be searched from the source file stored in the magnetic disk based on the source position identification of the index to be searched.

The target index determining unit is used for determining the index to be searched as the target index matched with the search keyword if the suffix character string corresponding to the index to be searched contains the search keyword.

As an embodiment of the present application, the data adding unit 54 specifically includes: the device comprises a residual storage capacity determining unit, an index adding unit and a storage space adjusting unit. Wherein:

the residual storage capacity determining unit is used for determining the residual storage capacity of the target linked list.

The index adding unit is configured to add the target index to the target linked list in the cache if a total data size of the target index is less than or equal to a remaining storage size of the target linked list.

The storage space adjusting unit is configured to reduce the storage space of the linked list in the cache except the target linked list and expand the storage space of the target linked list if the total data size of the target index is greater than the remaining storage size of the target linked list, until the remaining storage size of the target linked list is greater than or equal to the total data size of the target index, and add the target index to the expanded target linked list.

As another embodiment of the present application, the data adding unit 54 further includes: a fourth determining unit and a reduction and expansion unit. Wherein:

the fourth determining unit is configured to determine the to-be-reduced linked list with the stored data amount smaller than half of the storable data amount if the total data amount of the target index is greater than the remaining storage amount of the target linked list.

The reduction and expansion unit is used for reducing the amount of the storable data of the linked list to be reduced by half, expanding the amount of the storable data of the target linked list based on the amount of the storable data reduced by the linked list to be reduced until the residual storage capacity of the target linked list is greater than or equal to the total data amount of the target index, and adding the target index into the expanded target linked list.

As still another embodiment of the present application, the data adding unit 54 further includes: a fifth determining unit and an index adding unit. Wherein:

and the fifth determining unit is used for determining a target node to be stored with the target index based on the target index, the longest common prefix length of each index in the target linked list and the non-common character string.

The index adding unit is used for storing the target index into the target node in the target linked list.

As can be seen from the above, after the search keyword is obtained, the terminal device provided in the embodiment of the present application searches for an index matching the search keyword from the cache; when the target index matched with the search keyword exists in the cache, the suffix character string corresponding to the target index is directly obtained and output from the cache, and data does not need to be searched from a disk under the condition, so that the data query time is shortened, and the data query efficiency is improved; when the target index matched with the search keyword does not exist in the cache, the target index matched with the search keyword is obtained from the disk, and the target index is added into the cache, so that the target index can be directly searched from the cache in the following process.

Fig. 6 is a schematic diagram of a terminal device according to another embodiment of the present application. As shown in fig. 6, the terminal device 6 of this embodiment includes: a processor 60, a memory 61 and a computer program 62, such as a response program to a task request, stored in said memory 61 and executable on said processor 60. The processor 60, when executing the computer program 62, implements the steps in the various data storage method embodiments described above, such as S1-S5 shown in fig. 1. Alternatively, the processor 60, when executing the computer program 62, implements the functions of the units in the above-described device embodiments, such as the functions of the modules 51 to 55 shown in fig. 5.

Illustratively, the computer program 62 may be divided into one or more units, which are stored in the memory 61 and executed by the processor 60 to accomplish the present application. The one or more units may be a series of computer program instruction segments capable of performing specific functions, which are used to describe the execution of the computer program 62 in the terminal device 6. For example, the computer program 62 may be divided into a first search unit, a first detection unit, a second search unit, a data addition unit, and a data output unit, each of which functions as described above.

The terminal device 6 may be a desktop computer, a notebook, a palm computer, a cloud server, or other computing devices. The terminal device may include, but is not limited to, a processor 60, a memory 61. Those skilled in the art will appreciate that fig. 6 is merely an example of a terminal device 6 and does not constitute a limitation of terminal device 6 and may include more or less components than those shown, or some components in combination, or different components, for example, the terminal device may also include input output devices, network access devices, buses, etc.

The Processor 60 may be a Central Processing Unit (CPU), other general purpose Processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), an off-the-shelf Programmable Gate Array (FPGA) or other Programmable logic device, discrete Gate or transistor logic, discrete hardware components, etc. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.

The memory 61 may be an internal storage unit of the terminal device 6, such as a disk or a memory of the terminal device 6. The memory 61 may also be an external storage device of the terminal device 6, such as a plug-in magnetic disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card), and the like, provided on the terminal device 6. Further, the memory 61 may also include both an internal storage unit and an external storage device of the terminal device 6. The memory 61 is used for storing the computer program and other programs and data required by the terminal device. The memory 61 may also be used to temporarily store data that has been output or is to be output.

In addition, functional units in the embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit.

The above-mentioned embodiments are only used for illustrating the technical solutions of the present application, and not for limiting the same; although the present application has been described in detail with reference to the foregoing embodiments, it should be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; such modifications and substitutions do not substantially depart from the spirit and scope of the embodiments of the present application and are intended to be included within the scope of the present application.

Claims

1. A method of storing data, comprising:

2. The data storage method according to claim 1, wherein the indexed information comprises a longest common prefix length, a source location identifier, a non-common character string, a number of times of retrieval, and an address of a subsequent index; the longest common prefix length is used for describing the length of the longest common prefix of the suffix character string corresponding to the index and the preceding suffix character string thereof, the source position identification is used for describing the position of the suffix character string corresponding to the index in a source file, the non-common character string is used for describing the non-common character string with preset length of the suffix character string corresponding to the index and the preceding suffix character string thereof, the searched times are used for describing the times of the index being searched, and the address of the succeeding index is used for describing the address of the succeeding index of the index in the target linked list;

detecting whether a first index in the target linked list is empty;

3. The data storage method according to claim 2, wherein after comparing the suffix string corresponding to the first index with the search keyword, the method further comprises:

4. The data storage method of claim 1, wherein if a target index matching the search keyword does not exist in the target linked list, searching a target index matching the search keyword from a disk, comprising:

5. The data storage method of claim 1, wherein the adding the target index to the target linked list in the cache comprises:

determining the residual storage capacity of the target linked list;

6. The data storage method according to claim 5, wherein said reducing the storage space of the linked lists in the cache except the target linked list and expanding the storage space of the target linked list comprises:

7. The data storage method of any of claims 2-6, wherein the adding the target index to the target linked list in the cache comprises:

and storing the target index into the target node in the target linked list.

8. A terminal device, comprising:

9. A terminal device, characterized in that the terminal device comprises a memory, a processor and a computer program stored in the memory and executable on the processor, the processor executing the computer program with the steps of the method according to any of claims 1 to 7.

10. A computer-readable storage medium, in which a computer program is stored which, when being executed by a processor, carries out the steps of the method according to any one of claims 1 to 7.