CN111353012A - Spatial text data caching method and device, electronic equipment and storage medium - Google Patents

Spatial text data caching method and device, electronic equipment and storage medium Download PDF

Info

Publication number
CN111353012A
CN111353012A CN202010158290.8A CN202010158290A CN111353012A CN 111353012 A CN111353012 A CN 111353012A CN 202010158290 A CN202010158290 A CN 202010158290A CN 111353012 A CN111353012 A CN 111353012A
Authority
CN
China
Prior art keywords
spatial
space
text
text data
code
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202010158290.8A
Other languages
Chinese (zh)
Other versions
CN111353012B (en
Inventor
李宗祥
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
China Mobile Communications Group Co Ltd
MIGU Culture Technology Co Ltd
Original Assignee
China Mobile Communications Group Co Ltd
MIGU Culture Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by China Mobile Communications Group Co Ltd, MIGU Culture Technology Co Ltd filed Critical China Mobile Communications Group Co Ltd
Priority to CN202010158290.8A priority Critical patent/CN111353012B/en
Publication of CN111353012A publication Critical patent/CN111353012A/en
Application granted granted Critical
Publication of CN111353012B publication Critical patent/CN111353012B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/31Indexing; Data structures therefor; Storage structures
    • G06F16/316Indexing structures
    • G06F16/322Trees
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/38Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • G06F16/387Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using geographical or spatial information, e.g. location

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • Library & Information Science (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The embodiment of the invention discloses a method, a device, electronic equipment and a storage medium for caching spatial text data, designs a novel index structure and a storage mode facing the spatial text data, establishes the association between a spatial index and a text index through the coding of a spatial region, provides a high-efficiency spatial and text pruning mode, improves the query efficiency, and simultaneously saves the storage space of Redis and keeps the storage efficiency of Redis because the spatial region is coded and the spatial index and the text index are represented by the spatial coding. In addition, in this embodiment, the space region is divided into different sub-regions, and then, in combination with encoding of the space region, data of adjacent space regions can be stored on different Redis nodes, so that anti-aggregation storage of the space data is achieved.

Description

Spatial text data caching method and device, electronic equipment and storage medium
Technical Field
The invention relates to the technical field of computers, in particular to a spatial text data caching method and device, electronic equipment and a storage medium.
Background
With the development of mobile internet, when each internet company has generated a large amount of text data with spatial position information, i.e. spatial text data, such as micro-blogs, micro-letters with positioning information, etc. When people acquire the information, a space text query is usually initiated, namely the query condition contains both the first space position coordinate and the text keyword information. For example, a user may query a bookstore within 1 km from the location of the user, where the location of the user is spatial location information, and the "bookstore" is a text query condition.
How to quickly and effectively access the spatial text data becomes a problem which needs to be solved urgently by internet enterprises. The method has the advantages that the cache is established for the spatial text data, the data access efficiency can be effectively improved, Redis is favored to cache the spatial text data in numerous solutions, the Redis is a memory database based on a Key-Value storage structure, all data are stored in a memory, and an efficient data access mode is provided. The following two caching processing modes are mainly adopted when Redis is used for caching spatial text data at present:
the first method is as follows: a storage method based on a spatial grid.
The method divides a space region into a plurality of grids, the region coordinate of each grid is used as a Key, and Value stores the detailed information of all space text data in the space region. The system can find out the corresponding grid through the spatial position information, and obtain all spatial text information stored in the grid through Key, and then filter these spatial text information with the keyword of inquiry and get the final result;
the second method is as follows: storage method according to text keywords.
According to the method, a GeoHash method provided by Redis is used for mapping a space coordinate into an integer value, a Zset structure in the Redis is used for storing detailed information, and a keyword of space text data is used as a Key, so that text filtering can be provided for query of a user; and Zset's score stores the integer number of spatial coordinate mappings used to provide spatial filtering for the user's query. By using the method, the system establishes a Key-Value storage structure for each keyword, wherein the Key is the keyword, and the Value is the detailed information of the spatial text data containing the keyword. When the spatial text query needs to be inquired in the cache, Redis Key-Value pairs which accord with text filtering conditions are found out according to a Key, then Value of data which accord with spatial positions is found out according to score of each Key-Value pair, and finally specific data stored in the Value is read.
However, in practical applications, the above two methods generate the following problems in the face of huge spatial text data:
first, the problem of data skew in Redis. Both of the above methods have the problem of storage imbalance: for the first method, the data volume of a certain space region may be large, so that the storage capacity of storing the Key-Value Key Value pair of the region is large; in the second method, it may also happen that a keyword is contained in a large amount of spatial text data, so that Value corresponding to the keyword as Key stores a large amount of detailed information of the spatial text data. In addition, Redis accesses data according to a single-thread model, and a system spends a large amount of time reading the large-capacity Key-Value pair, so that other data access tasks are blocked, and the throughput of the system is reduced; meanwhile, if the accessed space area or the keyword becomes a hot spot area or a hot spot word, a large number of access requests are concentrated on certain nodes, the problem of inclination of data access occurs, and the efficiency of data access is reduced.
Second, the storage efficiency and access efficiency are considered in Redis. Redis stores all data in memory, and thus memory resources become extremely valuable. The first method uses the spatial coordinates as keys, and stores detailed data in the Value, which is relatively efficient in storage, but is relatively inefficient in filtering all spatial text data in the Value during query. The storage structure adopted by the second method can give consideration to spatial position filtering and text filtering, and the query efficiency is high; however, when the spatial text data contains a plurality of keywords, a plurality of copies of the spatial text data are stored in the memory, which greatly wastes the storage space of Redis and has low storage efficiency.
Disclosure of Invention
Because the existing method has the above problems, embodiments of the present invention provide a spatial text data caching method and apparatus, an electronic device, and a storage medium, which are used to solve at least one of the above technical problems.
Specifically, the embodiment of the invention provides the following technical scheme:
in a first aspect, an embodiment of the present invention provides a spatial text data caching method, including:
extracting a first space position coordinate and a first text keyword from first space text data to be cached;
inquiring a space index according to the first space position coordinate, and determining a first space code corresponding to the first space position coordinate; wherein, the spatial index stores the corresponding relation between each spatial position coordinate and the corresponding spatial code; the space coding refers to coding obtained by performing data compression on space position coordinates according to a preset coding rule;
querying a text index according to the first text keyword, and determining a first spatial coding set corresponding to the first text keyword; the text index stores the corresponding relation between each text keyword and a space code which has a space incidence relation with the text keyword;
determining a first target spatial encoding from the first spatial encoding and the first set of spatial encodings;
determining a corresponding storage position of the first spatial text data in Redis according to the first target spatial code and a first text keyword;
and storing the first space text data into Redis according to the storage position.
Further, querying a spatial index according to the first spatial position coordinate, and determining a first spatial code corresponding to the first spatial position coordinate specifically includes:
inquiring a space index according to the first space position coordinate, and if the space index can be inquired, determining a first space code corresponding to the first space position coordinate;
and if the query is not successful, creating a first space code corresponding to the first space position coordinate based on a quadtree rule and a Zorder curve according to the first space position coordinate.
If the first space code and the first space code set have intersection, judging whether the access heat value of the first space code is larger than a preset threshold value, if so, carrying out thinning coding on a space position coordinate according to a preset coding rule to obtain an optimized space code, taking the optimized space code as the first target space code, and establishing a corresponding relation between the first text keyword and the optimized space code in the text index; if the access heat of a first spatial code is less than or equal to a preset threshold, taking the first spatial code as the first target spatial code;
if the first spatial code and the first spatial code set do not have an intersection, establishing a corresponding relation between the first text keyword and the first spatial code in the text index, and taking the first spatial code as the first target spatial code.
Further, determining a corresponding storage location of the first spatial text data in Redis according to the first target spatial code and the first text keyword specifically includes:
determining a key value corresponding to the first spatial text data in Redis according to the first target spatial coding and the first text keyword;
determining a basic slot number of the first spatial text data according to the first target spatial code, the number of Redis cluster nodes and the total number of Redis slots;
determining the offset of the first spatial text data according to all keyword sets, the number of Redis cluster nodes and the total number of Redis slots corresponding to the first target spatial code;
determining a storage slot number corresponding to the first space text data in Redis according to a basic slot number of the first space text data and an offset of the first space text data;
and determining the corresponding storage position of the first space text data in Redis according to the key value and the storage slot number.
Further, determining a key value corresponding to the first spatial text data in Redis according to the first target spatial code and the first text keyword, specifically including:
determining a key value corresponding to the first spatial text data in Redis according to a first relation model, wherein the first relation model is key ═ (Zorder) + crc64 (keys);
wherein, Zorder represents the first target space code, keys represents all the keyword sets corresponding to the first target space code, and crc64(keys) represents converting all the keywords corresponding to the first target space code into an integer.
Further, determining a basic slot number of the first spatial text data according to the first target spatial code, the number of Redis cluster nodes, and the total number of Redis slots, specifically including:
determining the basic slot number of the first space text data according to a second relation model, wherein the second relation model is Larea=[(Zorder)*(16384/num)]%16384;
Wherein L isareaIndicating the base slot number, Zorder indicating the first target spatial coding, num indicating the number of Redis cluster nodes, 16384 being the total number of Redis slots,% indicating the modulo operation.
Further, determining an offset of the first spatial text data according to all keyword sets, the number of Redis cluster nodes, and the total number of Redis slots corresponding to the first target spatial code, specifically including:
determining the offset of the first space text data according to a third relation model, wherein the third relation model is Loffset=[crc64(keys)*(16384/num)]%16384;
Wherein L isoffsetIndicating an offset, crc64(keys) indicates converting all keys corresponding to the first target spatial coding into an integer, num indicates the number of Redis cluster nodes, 16384 is the total number of Redis slots, and% indicates a modulo operation.
Further, determining a storage slot number corresponding to the first spatial text data in Redis according to the base slot number of the first spatial text data and the offset of the first spatial text data, specifically including:
determining the number of a storage slot corresponding to the first space text data in Redis according to a fourth relational model, wherein the fourth relational model is that Loc is Larea+Loffset
Where Loc denotes the slot number, LareaIndicates the base groove number, LoffsetIndicating the offset.
Further, the spatial text data caching processing method further includes:
acquiring second spatial text data to be inquired;
extracting a second spatial position coordinate and a second text keyword in the second spatial text data;
inquiring a space index according to the second space position coordinate, and determining a second space code corresponding to the second space position coordinate;
querying a text index according to the second text keyword, and determining a second spatial coding set corresponding to the second text keyword;
determining a second target spatial code according to the intersection of the second spatial code and the second spatial code set;
determining a corresponding storage position of the spatial text data in Redis according to the second target spatial code;
and querying the second space text data in Redis according to the storage position, and returning a query result.
Further, the spatial text data caching processing method further includes:
acquiring an access heat value of each spatial code in a spatial index, comparing the access heat value with a preset first data access heat threshold, and deleting corresponding spatial text data if the access heat value is smaller than the preset first data access heat threshold;
or the like, or, alternatively,
and acquiring all spatial codes corresponding to each keyword in the text index, summing the access heat values of all spatial codes corresponding to each keyword, comparing the summation result with a preset second data access heat threshold, and deleting the corresponding spatial text data if the summation result is smaller than the preset second data access heat threshold.
In a second aspect, an embodiment of the present invention further provides a spatial text data caching device, including:
the extraction module is used for extracting a first space position coordinate and a first text keyword from first space text data to be cached;
the first determining module is used for inquiring a space index according to the first space position coordinate and determining a first space code corresponding to the first space position coordinate; wherein, the spatial index stores the corresponding relation between each spatial position coordinate and the corresponding spatial code; the space coding refers to coding obtained by performing data compression on space position coordinates according to a preset coding rule;
a second determining module, configured to query a text index according to the first text keyword, and determine a first spatial coding set corresponding to the first text keyword; the text index stores the corresponding relation between each text keyword and a space code which has a space incidence relation with the text keyword;
a third determining module configured to determine a first target spatial encoding according to the first spatial encoding and the first set of spatial encodings;
a fourth determining module, configured to determine, according to the first target spatial code and the first text keyword, a corresponding storage location of the first spatial text data in Redis;
and the storage module is used for storing the first space text data into Redis according to the storage position.
In a third aspect, an embodiment of the present invention further provides an electronic device, which includes a memory, a processor, and a computer program stored in the memory and executable on the processor, where the processor implements the spatial text data caching method according to the first aspect when executing the computer program.
In a fourth aspect, an embodiment of the present invention further provides a non-transitory computer-readable storage medium, on which a computer program is stored, where the computer program, when executed by a processor, implements the spatial text data caching processing method according to the first aspect.
It can be seen from the foregoing technical solutions that, in the spatial text data caching method, apparatus, electronic device, and storage medium provided in the embodiments of the present invention, a new index structure and storage method for spatial text data are designed, and a correlation between a spatial index and a text index is established through coding of a spatial region, so that an efficient spatial and text pruning method is provided, and query efficiency is improved. In addition, the embodiment of the invention divides the space region into different sub-regions, and combines with the coding of the space region, so that the data of the adjacent space region can be stored on different Redis nodes, thereby achieving the anti-aggregation storage of the space data.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to these drawings without creative efforts.
Fig. 1 is a flowchart of a spatial text data caching processing method according to an embodiment of the present invention;
fig. 2 is a schematic diagram of an overall design structure of a spatial text data caching method according to an embodiment of the present invention;
FIG. 3 is a schematic diagram of a quadtree partition-based spatial coding according to an embodiment of the present invention;
FIG. 4 is a diagram illustrating a spatial index storage structure according to an embodiment of the present invention;
FIG. 5 is a diagram illustrating a storage structure of a text index according to an embodiment of the present invention;
FIG. 6 is a diagram of codes transformed according to a space region partition diagram according to an embodiment of the present invention;
fig. 7 is a schematic processing procedure diagram of a spatial text data cache construction method according to an embodiment of the present invention;
FIG. 8 is a schematic processing diagram of a spatial text data query method according to an embodiment of the present invention;
fig. 9 is a schematic processing procedure diagram of a spatial text data cache replacement method according to an embodiment of the present invention;
fig. 10 is a schematic structural diagram of a spatial text data caching apparatus according to an embodiment of the present invention;
fig. 11 is a schematic structural diagram of an electronic device according to an embodiment of the present invention.
Detailed Description
The following further describes embodiments of the present invention with reference to the accompanying drawings. The following examples are only for illustrating the technical solutions of the present invention more clearly, and the protection scope of the present invention is not limited thereby.
Fig. 1 is a flowchart illustrating a spatial text data caching processing method according to an embodiment of the present invention, and as shown in fig. 1, the spatial text data caching processing method according to the embodiment of the present invention specifically includes the following steps:
step 101: extracting a first space position coordinate and a first text keyword from first space text data to be cached;
step 102: inquiring a space index according to the first space position coordinate, and determining a first space code corresponding to the first space position coordinate; wherein, the spatial index stores the corresponding relation between each spatial position coordinate and the corresponding spatial code; the space coding refers to coding obtained by performing data compression on space position coordinates according to a preset coding rule;
step 103: querying a text index according to the first text keyword, and determining a first spatial coding set corresponding to the first text keyword; the text index stores the corresponding relation between each text keyword and a space code which has a space incidence relation with the text keyword;
step 104: determining a first target spatial encoding from the first spatial encoding and the first set of spatial encodings;
step 105: determining a corresponding storage position of the first spatial text data in Redis according to the first target spatial code and a first text keyword;
step 106: and storing the first space text data into Redis according to the storage position.
Before describing the embodiments of the present invention, a brief description will be given of the concept of the present invention. The Redis-based cache structure designed by the embodiment of the invention comprises two parts. The first part is the index part of the spatial text data: the method provided by the embodiment combines a spatial index method and a text index method, firstly, spatial attributes of spatial text data are indexed, each node of the index is coded, and a new index storage mode is designed on the basis of the coding; secondly, an index structure similar to an inverted index is established for the text type attribute of the data, and different from the inverted index, in the text index of the embodiment, the keywords correspond to leaf node codes of the spatial index, and the codes can be combined with the spatial index to simultaneously prune the space and the text, so that the data access efficiency is improved. The second part is actual storage of spatial text data, the part stores the spatial text data according to a Key-Value format of Redis, all the spatial text data are stored according to the data distribution method designed by the embodiment, the effect that data containing the same keyword in adjacent spatial regions can be respectively stored on different machines is achieved, the anti-aggregation effect of data storage is realized, and the generation of data inclination and access hot spots is avoided, the overall design structure of the embodiment of the invention is shown in FIG. 2, Key Value pairs in Redis are divided into two types, one type is stored with data indexes, and the spatial indexes and the keyword indexes (text indexes) are contained; another type stores true spatial text data. In fig. 2, because the spatial index designed by the embodiment of the present invention is not large in size, in order to improve query efficiency, a copy of the spatial index may be stored in each Redis node, where "key 1-index" in fig. 2 represents an inverted index established for key 1, and an index entry in the inverted index is a leaf node code in the spatial index. Each leaf node of the space index represents a space region, the space index and the text index are associated through leaf node coding, and whether the sub-region contains keywords to be inquired or not can be judged through the keyword index when the leaf node of the space index is inquired, so that the effects of simultaneously performing space filtering and text filtering are achieved.
According to the embodiment of the invention, the spatial index is established for the spatial text data according to the rule of the quadtree, namely, the spatial region is divided by the quadtree. Embodiments of the present invention will use one Key-Value pair for storage. The index structure designed by the embodiment of the invention comprises a leaf node structure and an intermediate node structure of the index. The intermediate nodes of the index, i.e. the non-leaf nodes, store only one integer value, which is obtained by a Zorder curve, which may be a spatial coding of a quadtree partition, as shown in FIG. 3.
When the spatial region is divided into four blocks, the corresponding code is 0,1,2,3, on this basis, when the spatial region is divided into 16 blocks, the corresponding code is 0,1,2,3,4,5,6,7,8,9,10,11,12,13,1,4, 15. Each spatial region divided by the quadtree has its own code, so the codes are stored in the middle node of the index, and the codes can identify the spatial region and save the storage space of the middle node. The leaf node in this embodiment may store a value for identifying the access heat in addition to the encoding of the space region, and the leaf node is automatically incremented by 1 each time it is accessed. Therefore, the spatial index storage structure designed by the embodiment of the invention is shown in fig. 4.
The embodiment of the invention simultaneously stores the head node (the structure is the same as that of the middle node), the middle node and the leaf node of the index in the Value according to the sequence for query. As shown in fig. 4, the head node and the intermediate node both store only one Value of int type, while the leaf node has two data of int type because of the access heat, each int type only occupies 4 bytes, so that even if the Value has a storage capacity of only 1MB, the information of 218 nodes can be stored, and in actual situations, the indexed nodes are often not so many, so the index designed by this embodiment can be kept in a smaller scale, and will not occupy a large amount of Redis storage space; meanwhile, because the index can be subjected to spatial pruning, the data of the non-relevant areas are filtered during query, so that the query efficiency is improved, and the data access efficiency and the Redis storage efficiency are considered at the same time.
In this embodiment, for the design method of text indexes, similar to the inverted index, each indexed keyword establishes an index, and the index item of the index, that is, the leaf node code of the spatial index. Each Key index is also stored using a Key-Value Key Value pair, where Key is a Key and Value is a specific index entry, and each Key index is uniformly distributed to each node in the cluster according to the distribution of Redis. As shown in fig. 5, the Key is indexed and the Value stores the leaf node number of the spatial index, and 4, 7, 10,11 in fig. 5 just correspond to the sub-region numbers in the previous fig. 5, and each number corresponds to a spatial sub-region. By adopting the method, the association between the keywords and the space region can be established, the text filtering is convenient to carry out, and the storage space of the index can be reduced, because the stored node codes can save a large amount of storage space compared with the storage of actual data. In addition, if the index is established by using the quadtree space in this embodiment, only leaf nodes can be stored, the number of node codes is further reduced, the occupied amount of the storage space is further reduced, and the storage efficiency is optimized. In this embodiment, only leaf nodes are stored because the parent node of a node can be easily inferred based on the characteristics of the spatial index of the quadtree, the quadtree has only one path from the root node to a certain leaf node, and the path from the leaf node to the root node can be reversely deduced through the coding of the leaf node, so that whether each intermediate node from the root node to the leaf node contains the keyword can be known.
As shown in fig. 6, the aforementioned space division map is converted into a coded form. When the spatial index has only two layers, the space is divided into 4 sub-regions, and two-bit coding is used, for example, coding 2 is 10; when there are three layers of spatial indexes, the third layer is divided into 16 sub-regions, and 4 bits are required for encoding, for example, 1000 is required for encoding 8. It can be seen from the figure that prefixes of codes of three layers of sub-regions corresponding to the index two layers are all two layers of codes, for example, the sub-region 8 in fig. 6 is 1000, the first two bits are 10, and the codes are just codes of the upper layer node, namely the parent node of the upper layer node, so that the parent node codes can be easily calculated when the coded value of a leaf node and the depth of the index are obtained, and by analogy, codes of all nodes from the leaf node to the root node can be obtained, at this time, because the leaf node contains a certain key, the parent nodes of the leaf node all contain the key, and finally, by the method, key information of each node from the root node to the leaf node can be obtained, and spatial filtering and text filtering can be simultaneously performed during query. Therefore, the embodiment establishes the association between the spatial index and the text index through the coding of the spatial region, provides a high-efficiency spatial and text pruning mode, improves the query efficiency, and simultaneously, the spatial region is coded, and the spatial index and the text index are represented by using the spatial coding, so that the storage space of Redis is saved, and the storage efficiency of Redis is maintained.
In this embodiment, when the spatial region is divided, R-trees may be used in addition to the quadtrees. In addition, when spatial coding is performed, in addition to the Zorder curve, a Hilbert curve may be used.
In this embodiment, it should be noted that real spatial text data is stored in the Redis in a form of Key-Value, where Key is calculated by a node number of the spatial index and a Key included in the node, and Value stores detailed data information. The spatial text data distribution method designed in this embodiment is based on the index division, a spatial region can be divided into different sub-regions through the spatial index, and data of adjacent spatial regions can be stored on different Redis nodes by combining with coding of the spatial region, so that anti-aggregation storage of spatial data is achieved.
It can be known from the foregoing technical solutions that, in the spatial text data caching method provided in the embodiments of the present invention, a new index structure and storage mode oriented to spatial text data are designed, and a correlation between a spatial index and a text index is established by coding a spatial region, so that an efficient spatial and text pruning mode is provided, and query efficiency is improved. In addition, the embodiment of the invention divides the space region into different sub-regions, and combines with the coding of the space region, so that the data of the adjacent space region can be stored on different Redis nodes, thereby achieving the anti-aggregation storage of the space data.
Based on the content of the foregoing embodiment, in this embodiment, querying a spatial index according to the first spatial position coordinate, and determining a first spatial code corresponding to the first spatial position coordinate specifically includes:
inquiring a space index according to the first space position coordinate, and if the space index can be inquired, determining a first space code corresponding to the first space position coordinate;
and if the query is not successful, creating a first space code corresponding to the first space position coordinate based on a quadtree rule and a Zorder curve according to the first space position coordinate.
In this embodiment, when there is a spatial code corresponding to the first spatial position coordinate in Redis, it may be directly obtained, and if there is no spatial code, it is necessary to create a first spatial code corresponding to the first spatial position coordinate based on a quadtree rule and a Zorder curve. The specific creation mode may be encoded by using a Zorder curve for the space divided by the quadtree, and for the description of this section, reference may be made to the description of the above embodiment, and details of this embodiment are not described again.
Based on the content of the foregoing embodiment, in this embodiment, the determining a first target spatial encoding according to the first spatial encoding and the first spatial encoding set specifically includes:
if the first spatial code and the first spatial code set have an intersection, judging whether the access heat value of the first spatial code is larger than a preset threshold value, if so, performing fission coding on the first spatial code according to a preset coding rule to obtain an optimized spatial code, taking the optimized spatial code as the first target spatial code, and establishing a corresponding relation between the first text keyword and the optimized spatial code in the text index; if the access heat of a first spatial code is less than or equal to a preset threshold, taking the first spatial code as the first target spatial code;
if the first spatial code and the first spatial code set do not have an intersection, establishing a corresponding relation between the first text keyword and the first spatial code in the text index, and taking the first spatial code as the first target spatial code.
In this embodiment, in order to ensure the index partitioning effect, the data capacity that each leaf node can store is limited within a certain range (e.g. 64KB) (i.e. the access heat of the spatial code corresponding to the leaf node is limited within a certain range), and node splitting is performed beyond this range, which can increase the data storage differentiation and facilitate network transmission. Therefore, in this implementation, performing fission coding on the first spatial code according to a preset coding rule to obtain an optimized spatial code means: and when the access heat value of the first spatial code is larger than a preset threshold value, splitting the node (the node refers to the node corresponding to the first spatial code) based on a quadtree node splitting principle, and taking the spatial code corresponding to the new node generated after splitting as the optimized spatial code.
Based on the content of the foregoing embodiment, in this embodiment, determining a storage location of the first spatial text data in Redis according to the first target spatial code and the first text keyword specifically includes:
determining a key value corresponding to the first spatial text data in Redis according to the first target spatial coding and the first text keyword;
determining a basic slot number of the first spatial text data according to the first target spatial code, the number of Redis cluster nodes and the total number of Redis slots;
determining the offset of the first spatial text data according to all keyword sets, the number of Redis cluster nodes and the total number of Redis slots corresponding to the first target spatial code;
determining a storage slot number corresponding to the first space text data in Redis according to a basic slot number of the first space text data and an offset of the first space text data;
and determining the corresponding storage position of the first space text data in Redis according to the key value and the storage slot number.
In this embodiment, determining a key value corresponding to the first spatial text data in Redis according to the first target spatial code and the first text keyword specifically includes:
determining a key value corresponding to the first spatial text data in Redis according to a first relation model, wherein the first relation model is key ═ (Zorder) + crc64 (keys);
wherein, Zorder represents the first target space code, keys represents all the keyword sets corresponding to the first target space code, and crc64(keys) represents converting all the keywords corresponding to the first target space code into an integer.
In this embodiment, determining a basic slot number of the first spatial text data according to the first target spatial code, the number of Redis cluster nodes, and the total number of Redis slots specifically includes:
determining a basis for the first spatial text data according to a second relational modelSlot number, the second relationship model is Larea=[(Zorder)*(16384/num)]%16384;
Wherein L isareaIndicating the base slot number, Zorder indicating the first target spatial coding, num indicating the number of Redis cluster nodes, 16384 being the total number of Redis slots,% indicating the modulo operation.
In this embodiment, determining an offset of the first spatial text data according to all keyword sets, the number of Redis cluster nodes, and the total number of Redis slots corresponding to the first target spatial code specifically includes:
determining the offset of the first space text data according to a third relation model, wherein the third relation model is Loffset=[crc64(keys)*(16384/num)]%16384;
Wherein L isoffsetIndicating an offset, crc64(keys) indicates converting all keys corresponding to the first target spatial coding into an integer, num indicates the number of Redis cluster nodes, 16384 is the total number of Redis slots, and% indicates a modulo operation.
In this embodiment, determining, according to the base slot number of the first spatial text data and the offset of the first spatial text data, a corresponding storage slot number of the first spatial text data in Redis specifically includes:
determining the number of a storage slot corresponding to the first space text data in Redis according to a fourth relational model, wherein the fourth relational model is that Loc is Larea+Loffset
Where Loc denotes the slot number, LareaIndicates the base groove number, LoffsetIndicating the offset.
As is apparent from the above description, in the present embodiment, real spatial text data is stored in the form of Key-Value in Redis. The Key is calculated by the node number of the spatial index and the Key word contained in the node, and the Value stores detailed data information. The calculation formula of Key is as follows:
Key=(Zorder)+crc64(keys)
where Zorder is the encoding of the index leaf node, and the crc64 algorithm converts all keys contained in this leaf node into an integer.
The spatial text data distribution method designed in this embodiment is based on the above-mentioned index division. The space region can be divided into different sub-regions through the quad-tree index, and data of adjacent space regions can be stored on different Redis nodes by combining the coding of the space region, so that the inverse aggregation storage of the space data is achieved. In addition, in this embodiment, each keyword is encoded, and by combining this encoding with the previous spatial region encoding, data including the same keyword in adjacent spatial regions can be dispersedly stored on different Redis nodes, so that whether a certain spatial region becomes a hot spot region or a certain keyword becomes a hot spot word, hot spot query can be dispersed, and the problem of hot spots caused by spatial query aggregation and keyword query aggregation is solved at the same time.
In combination with the data distribution characteristics of Redis, the method for calculating the number of the storage slot of the spatial text data is designed as follows:
Larea=[(Zorder)*(16384/num)]%16384
first, the groove number Larea of a certain divided region in the index is obtained. As shown in the above formula, Zorder is the code of the index region, num is the number of Redis cluster nodes, 16384 is the total number of Redis slots, a fixed value is obtained by the code of the space region, the value is multiplied by the number of slots of each node, and the modulo of the total number of slots is obtained to obtain the slot number corresponding to a certain region of the index, and all the space data of the index in the region are mapped to the slot number. Based on the above, an offset is calculated for each piece of spatial text data, and the offset of each piece of spatial text data is different, so that each piece of spatial text data can be stored in different Redis nodes. The offset is calculated as follows:
Loffset=[crc64(keys)*(16384/num)]%16384
as shown in the above formula, keys represents the set of keys contained in the leaf node, all contained keys are combined together, an integer is obtained by using the crc64 algorithm, and then the integer is multiplied by the number of slots of each node, so that the offset is ensured to cross Redis node levels, and each offset enters another Redis node by one bit. Therefore, the slot number calculation formula of the spatial text data is as follows:
Loc=Larea+Loffset
as shown in the above equation, the slot number of the spatial text data is equal to the base slot number Larea of the spatial region plus the offset amount Loffet. Since each region of the index corresponds to a number and each leaf node of the index corresponds to a number of different keys, different L values are calculated for adjacent spatial regions in the indexareaAnd if the number of the spatial texts of each leaf node is different, the offset is also different, and finally, the slot number of Redis storage is calculated by using different spatial offsets and different text offsets, and data which are adjacent in space or have the same text keywords are subjected to anti-aggregation storage. When a certain space region becomes a hot spot region, because the data of the adjacent region is dispersedly stored on different Redis nodes, the data access request is also dispersed on different Redis nodes to eliminate the hot spot access problem of the space region. When a certain keyword becomes a hotword, the spatial text data containing the keyword are respectively stored on different machines, and the data access requests are dispersed to different Redis nodes to eliminate the problem of hotword access.
The spatial text data distribution method designed in this embodiment is based on the index division, a spatial region can be divided into different sub-regions through the quadtree index, and data of adjacent spatial regions can be stored on different Redis nodes by combining with coding of the spatial region, so as to achieve anti-aggregation storage of spatial data. In addition, in this embodiment, each keyword is encoded, and by combining this encoding with the previous spatial region encoding, data including the same keyword in adjacent spatial regions can be dispersedly stored on different Redis nodes, so that whether a certain spatial region becomes a hot spot region or a certain keyword becomes a hot spot word, hot spot query can be dispersed, and the problem of hot spots caused by spatial query aggregation and keyword query aggregation is solved at the same time.
In this embodiment, a cache construction method provided in this embodiment is described with reference to a cache construction flowchart shown in fig. 7. As shown in fig. 7, the present embodiment designs a cache construction method for the above-mentioned index structure. When new spatial document data are to be placed into a cache, extracting spatial position coordinates and text keywords of the spatial document data, finding a corresponding index according to the keywords, finding an insertion position of the data in the index according to spatial coordinate information, calculating Redis slot numbers stored in the spatial data according to spatial coding and the keyword information, and finally storing the data into a slot of a corresponding Redis node, wherein the main process comprises the following steps:
step 1, when new spatial text data are to be put into Redis, extracting spatial position coordinate information and text keyword information of the spatial text data for use in subsequent index insertion and storage slot number calculation;
step 2, obtaining a spatial index and a text index, wherein the spatial index can be directly read from Redis, and each Redis node has a copy of the spatial index; and the text index can be obtained by querying the keyword to Redis.
Step 3, judging whether the index exists or not when the index is obtained, if not, turning to step 4, and if so, turning to step 5;
step 4, if the index corresponding to the keyword does not exist or the spatial index does not exist, the system establishes an index and a spatial index corresponding to the keyword, and then the step 5 is switched;
step 5, the system finds the leaf node which is to be split according to the space position coordinates of the data in the space index, and obtains the codes of the index division sub-regions, and if the capacity of the leaf node exceeds 64KB, the leaf node needs to be split; simultaneously inserting the encoding of the leaf node in the text index;
step 6, the system calculates the number of a storage slot of the data according to the codes of the spatial sub-regions and the keyword information, and then calculates the Key value of the spatial text data;
and 7, storing the spatial text data into Redis according to the calculated slot number and the Key value.
Based on the content of the foregoing embodiment, in this embodiment, the method for spatial text data caching further includes:
acquiring second spatial text data to be inquired;
extracting a second spatial position coordinate and a second text keyword in the second spatial text data;
inquiring a space index according to the second space position coordinate, and determining a second space code corresponding to the second space position coordinate;
querying a text index according to the second text keyword, and determining a second spatial coding set corresponding to the second text keyword;
determining a second target spatial code according to the intersection of the second spatial code and the second spatial code set;
determining a corresponding storage position of the spatial text data in Redis according to the second target spatial code;
and querying the second space text data in Redis according to the storage position, and returning a query result.
In this embodiment, a cache data query method is designed based on the cache data storage structure provided in the above embodiments, and can effectively deal with the problem of data access tilt generated by hot spot region query and hot word query, as shown in fig. 8, the main flow of the data query method provided in this embodiment is as follows:
step 1, a system firstly acquires a query condition of spatial text data, wherein the query condition comprises spatial coordinate information and text keyword information, for example, a user can send out a food city closest to the user, the coordinate of the user is spatial position information, and the food city is keyword information;
step 2, the system needs to extract the space coordinates and keyword information of the space text data query for subsequent query;
step 3, the system loads indexes according to the conditions inquired by the user, and the system loads spatial indexes and loads corresponding keyword indexes according to the inquired keywords;
step 4, judging whether the user can obtain index information according to the keywords, if not, turning to step 5, and if yes, turning to step 6;
step 5, because each keyword can not obtain the index, the spatial text data to be inquired by the user is not in the cache, and at this time, a result is returned to inform the system to inquire the related data on the disk, and the inquiry processing is finished;
step 6, inquiring according to the acquired index, filtering according to the space position information and the text information, removing the subareas with inconsistent space positions and the subareas without inquiry keywords to obtain candidate leaf nodes, and calculating the storage slot numbers of the leaf nodes;
step 7, finding a corresponding Redis slot according to the number of the storage slot of the inquired leaf node, calculating a Key corresponding to the leaf node, searching in the slot according to the Key, and inquiring the space text data meeting the conditions;
and 8, returning the inquired spatial text data to the user.
Based on the content of the foregoing embodiment, in this embodiment, the method for spatial text data caching further includes:
acquiring an access heat value of each spatial code in a spatial index, comparing the access heat value with a preset first data access heat threshold, and deleting corresponding spatial text data if the access heat value is smaller than the preset first data access heat threshold;
or the like, or, alternatively,
and acquiring all spatial codes corresponding to each keyword in the text index, summing the access heat values of all spatial codes corresponding to each keyword, comparing the summation result with a preset second data access heat threshold, and deleting the corresponding spatial text data if the summation result is smaller than the preset second data access heat threshold.
In the embodiment, an index-based cache replacement method is designed, which can replace cache data according to the access heat of spatial text combination, and also supports replacement of cache data according to the access heat of keywords. In this embodiment, a data access heat threshold of a spatial text dimension is preset, and the access heat of a sub-region represented by a leaf node of a spatial index can be obtained only by acquiring the access heat of the leaf node, and the access heat is compared with the preset threshold, if the access heat is lower than the threshold, the leaf node and the spatial text data corresponding to the leaf node are deleted, and if the access heat is higher than the threshold, no operation is performed. For cache replacement of keyword access popularity, similar to spatial dimension, in this embodiment, a threshold is set for the keyword access popularity, all leaf nodes including keywords may be obtained from the text index, the popularity value summation is taken from the leaf nodes to obtain the popularity value of the keywords, the popularity values of all keywords of a spatial text data are summed to obtain a total popularity value, and then the total popularity value is compared with the set threshold, and if the total popularity value is smaller than the threshold, the corresponding spatial text data is removed. As shown in fig. 9, the main flow of the cache replacement method provided in this embodiment is as follows:
step 1, initializing a cache replacement method by a system, reading corresponding spatial index and text index, setting polling time for cache replacement by the system, and setting a threshold value of data access heat;
step 2, the system periodically checks the access heat of the space text data according to the set polling time;
step 3, the system needs to judge a calculation method of the data access heat, if the system accesses the heat according to the space text data, the step 4 is switched, and if the system accesses the heat according to the keywords, the step 6 is switched;
step 4, judging whether the system needs to replace data, acquiring the access heat of each leaf node of the spatial index by the system, comparing the access heat of each leaf node with a threshold, turning to step 8 if the access heat is greater than or equal to the threshold, and turning to step 5 if the access heat is smaller than the threshold;
step 5, deleting the space text data corresponding to the leaf nodes with the access heat smaller than the threshold from Redis, loading new space text data, and inserting the new data according to the cache construction method designed by the embodiment;
step 6, judging whether the system needs to replace data or not, obtaining the text index by the system, obtaining all leaf nodes of the keywords so as to obtain the access heat of each keyword, then summing the access heat of all the keywords of one space data to obtain the total heat to be compared with a threshold, if the total heat is greater than or equal to the threshold, turning to step 8, and if the total heat is smaller than the threshold, turning to step 7;
step 7, deleting the spatial text data with the keyword access heat degree smaller than the threshold from Redis, simultaneously loading new spatial text data, and inserting the new data according to the cache construction method designed by the embodiment;
step 8, the access heat of the data is greater than a threshold value, and the system does not need to replace the cached data;
and 9, finishing one polling and waiting for the next polling by the system.
As can be seen from the above description, the present embodiment designs a Redis data storage structure based on a spatial text index, and by using the index, both the storage efficiency of the Redis is maintained and the access efficiency of the data is improved. The embodiment designs a new index structure and a storage mode facing to spatial text data, establishes the association between a spatial index and a text index through the coding of a spatial region, provides an efficient spatial and text pruning mode, and simultaneously reduces the index size and saves the storage space of Redis; on the basis of indexing, the embodiment designs a distribution method of spatial text data, and performs inverse aggregation storage on spatially adjacent data containing the same keyword, so that the problem of data access hot spots in Redis is solved; next, the construction method and the query method of the index-based spatial text data cache are designed in the embodiment, so that the data access efficiency is improved; finally, the embodiment designs a cache replacement method, which not only supports the cache replacement method with the access heat of the spatial text data as the standard, but also supports the cache replacement method with the access heat of the keyword as the standard. Compared with the existing method, the method designed by the embodiment can better deal with the problem of data inclination, because the data distribution method designed by the embodiment dispersedly stores the data containing the same keyword in the adjacent space onto different Redis nodes, the storage inclination of the data is avoided, and meanwhile, the data access in the same area is dispersed, and the occurrence of access hot spots is avoided; the method designed by the embodiment also gives consideration to Redis storage efficiency and data access efficiency, the index designed by the embodiment can filter a large amount of useless data to improve the data access efficiency, and only the storage node codes in the index have small index capacity and do not occupy a large amount of Redis storage space, so that the Redis storage efficiency is maintained. The method designed by the embodiment is not only suitable for one-dimensional data but also suitable for data with higher dimensionality, and practice proves that the method designed by the embodiment can effectively improve data access efficiency.
Fig. 10 is a schematic structural diagram illustrating a spatial text data caching apparatus according to an embodiment of the present invention. As shown in fig. 10, the spatial text data caching apparatus according to the embodiment of the present invention includes: an extraction module 21, a first determination module 22, a second determination module 23, a third determination module 24, a fourth determination module 25, and a storage module 26, wherein:
the extraction module 21 is configured to extract a first spatial position coordinate and a first text keyword from first spatial text data to be cached;
a first determining module 22, configured to query a spatial index according to the first spatial position coordinate, and determine a first spatial code corresponding to the first spatial position coordinate; wherein, the spatial index stores the corresponding relation between each spatial position coordinate and the corresponding spatial code; the space coding refers to coding obtained by performing data compression on space position coordinates according to a preset coding rule;
a second determining module 23, configured to query a text index according to the first text keyword, and determine a first spatial coding set corresponding to the first text keyword; the text index stores the corresponding relation between each text keyword and a space code which has a space incidence relation with the text keyword;
a third determining module 24, configured to determine a first target spatial coding according to the first spatial coding and the first spatial coding set;
a fourth determining module 25, configured to determine, according to the first target spatial code and the first text keyword, a corresponding storage location of the first spatial text data in Redis;
and a storage module 26, configured to store the first spatial text data into a Redis according to the storage location.
The spatial text data caching device provided by the embodiment can be used for executing the spatial text data caching method provided by the above embodiment, and the working principle and the beneficial effect are similar, so that detailed description is omitted here.
Based on the same inventive concept, another embodiment of the present invention provides an electronic device, which specifically includes the following components, with reference to fig. 11: a processor 301, a memory 302, a communication interface 303, and a communication bus 304;
the processor 301, the memory 302 and the communication interface 303 complete mutual communication through the communication bus 304; the communication interface 303 is used for realizing information transmission between the devices;
the processor 301 is configured to call a computer program in the memory 302, and when the processor executes the computer program, the processor implements all the steps of the spatial text data caching processing method, for example, when the processor executes the computer program, the processor implements the following steps: extracting a first space position coordinate and a first text keyword from first space text data to be cached; inquiring a space index according to the first space position coordinate, and determining a first space code corresponding to the first space position coordinate; wherein, the spatial index stores the corresponding relation between each spatial position coordinate and the corresponding spatial code; the space coding refers to coding obtained by performing data compression on space position coordinates according to a preset coding rule; querying a text index according to the first text keyword, and determining a first spatial coding set corresponding to the first text keyword; the text index stores the corresponding relation between each text keyword and a space code which has a space incidence relation with the text keyword; determining a first target spatial encoding from the first spatial encoding and the first set of spatial encodings; determining a corresponding storage position of the first spatial text data in Redis according to the first target spatial code and a first text keyword; and storing the first space text data into Redis according to the storage position.
Based on the same inventive concept, another embodiment of the present invention provides a non-transitory computer-readable storage medium, on which a computer program is stored, where the computer program, when executed by a processor, implements all the steps of the spatial text data caching processing method, for example, when the processor executes the computer program, the processor implements the following steps: extracting a first space position coordinate and a first text keyword from first space text data to be cached; inquiring a space index according to the first space position coordinate, and determining a first space code corresponding to the first space position coordinate; wherein, the spatial index stores the corresponding relation between each spatial position coordinate and the corresponding spatial code; the space coding refers to coding obtained by performing data compression on space position coordinates according to a preset coding rule; querying a text index according to the first text keyword, and determining a first spatial coding set corresponding to the first text keyword; the text index stores the corresponding relation between each text keyword and a space code which has a space incidence relation with the text keyword; determining a first target spatial encoding from the first spatial encoding and the first set of spatial encodings; determining a corresponding storage position of the first spatial text data in Redis according to the first target spatial code and a first text keyword; and storing the first space text data into Redis according to the storage position.
In addition, the logic instructions in the memory may be implemented in the form of software functional units and may be stored in a computer readable storage medium when sold or used as a stand-alone product. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.
The above-described embodiments of the apparatus are merely illustrative, and the units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the embodiment of the present invention. One of ordinary skill in the art can understand and implement it without inventive effort.
Through the above description of the embodiments, those skilled in the art will clearly understand that each embodiment can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware. Based on such understanding, the above technical solutions may be embodied in the form of a software product, which may be stored in a computer-readable storage medium, such as a ROM/RAM, a magnetic disk, an optical disk, etc., and includes several instructions for enabling a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the spatial text data caching method according to the embodiments or some parts of the embodiments.
In addition, in the present invention, terms such as "first" and "second" are used for descriptive purposes only and are not to be construed as indicating or implying relative importance or implicitly indicating the number of technical features indicated. Thus, a feature defined as "first" or "second" may explicitly or implicitly include at least one such feature. In the description of the present invention, "a plurality" means at least two, e.g., two, three, etc., unless specifically limited otherwise.
Moreover, in the present invention, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.
Furthermore, in the description herein, reference to the description of the term "one embodiment," "some embodiments," "an example," "a specific example," or "some examples," etc., means that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the invention. In this specification, the schematic representations of the terms used above are not necessarily intended to refer to the same embodiment or example. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples. Furthermore, various embodiments or examples and features of different embodiments or examples described in this specification can be combined and combined by one skilled in the art without contradiction.
Finally, it should be noted that: the above examples are only intended to illustrate the technical solution of the present invention, but not to limit it; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present invention.

Claims (13)

1. A spatial text data caching processing method is characterized by comprising the following steps:
extracting a first space position coordinate and a first text keyword from first space text data to be cached;
inquiring a space index according to the first space position coordinate, and determining a first space code corresponding to the first space position coordinate; wherein, the spatial index stores the corresponding relation between each spatial position coordinate and the corresponding spatial code; the space coding refers to coding obtained by performing data compression on space position coordinates according to a preset coding rule;
querying a text index according to the first text keyword, and determining a first spatial coding set corresponding to the first text keyword; the text index stores the corresponding relation between each text keyword and a space code which has a space incidence relation with the text keyword;
determining a first target spatial encoding from the first spatial encoding and the first set of spatial encodings;
determining a corresponding storage position of the first spatial text data in Redis according to the first target spatial code and a first text keyword;
and storing the first space text data into Redis according to the storage position.
2. The method for spatial text data caching processing according to claim 1, wherein querying a spatial index according to the first spatial location coordinate and determining a first spatial code corresponding to the first spatial location coordinate specifically includes:
inquiring a space index according to the first space position coordinate, and if the space index can be inquired, determining a first space code corresponding to the first space position coordinate;
and if the query is not successful, creating a first space code corresponding to the first space position coordinate based on a quadtree rule and a Zorder curve according to the first space position coordinate.
3. The method for spatial text data caching processing according to claim 2, wherein said determining a first target spatial encoding according to the first spatial encoding and the first spatial encoding set specifically includes:
if the first spatial code and the first spatial code set have an intersection, judging whether the access heat value of the first spatial code is larger than a preset threshold value, if so, performing fission coding on the first spatial code according to a preset coding rule to obtain an optimized spatial code, taking the optimized spatial code as the first target spatial code, and establishing a corresponding relation between the first text keyword and the optimized spatial code in the text index; if the access heat of a first spatial code is less than or equal to a preset threshold, taking the first spatial code as the first target spatial code;
if the first spatial code and the first spatial code set do not have an intersection, establishing a corresponding relation between the first text keyword and the first spatial code in the text index, and taking the first spatial code as the first target spatial code.
4. The spatial text data caching method according to claim 1, wherein determining a storage location of the first spatial text data in Redis according to the first target spatial code and a first text keyword specifically includes:
determining a key value corresponding to the first spatial text data in Redis according to the first target spatial coding and the first text keyword;
determining a basic slot number of the first spatial text data according to the first target spatial code, the number of Redis cluster nodes and the total number of Redis slots;
determining the offset of the first spatial text data according to all keyword sets, the number of Redis cluster nodes and the total number of Redis slots corresponding to the first target spatial code;
determining a storage slot number corresponding to the first space text data in Redis according to a basic slot number of the first space text data and an offset of the first space text data;
and determining the corresponding storage position of the first space text data in Redis according to the key value and the storage slot number.
5. The method for spatial text data caching processing according to claim 4, wherein determining a key value corresponding to the first spatial text data in Redis according to the first target spatial code and the first text keyword specifically includes:
determining a key value corresponding to the first spatial text data in Redis according to a first relation model, wherein the first relation model is key ═ (Zorder) + crc64 (keys);
wherein, Zorder represents the first target space code, keys represents all the keyword sets corresponding to the first target space code, and crc64(keys) represents converting all the keywords corresponding to the first target space code into an integer.
6. The spatial text data caching processing method according to claim 4, wherein determining a base slot number of the first spatial text data according to the first target spatial code, the number of Redis cluster nodes, and the total number of Redis slots specifically includes:
determining the basic slot number of the first space text data according to a second relation model, wherein the second relation model is Larea=[(Zorder)*(16384/num)]%16384;
Wherein L isareaIndicating the base slot number, Zorder indicating the first target spatial coding, num indicating the number of Redis cluster nodes, 16384 being the total number of Redis slots,% indicating the modulo operation.
7. The spatial text data caching processing method according to claim 4, wherein determining an offset of the first spatial text data according to all the keyword sets, the number of Redis cluster nodes, and the total number of Redis slots corresponding to the first target spatial encoding specifically includes:
determining the offset of the first space text data according to a third relation model, wherein the third relation model is Loffset=[crc64(keys)*(16384/num)]%16384;
Wherein L isoffsetIndicating an offset, crc64(keys) indicates converting all keys corresponding to the first target spatial coding into an integer, num indicates the number of Redis cluster nodes, 16384 is the total number of Redis slots, and% indicates a modulo operation.
8. The method for spatial text data caching processing according to claim 4, wherein determining, according to a base slot number of the first spatial text data and an offset of the first spatial text data, a storage slot number corresponding to the first spatial text data in Redis specifically comprises:
determining the number of a storage slot corresponding to the first space text data in Redis according to a fourth relational model, wherein the fourth relational model is that Loc is Larea+Loffset
Where Loc denotes the slot number, LareaIndicates the base groove number, LoffsetIndicating the offset.
9. The spatial text data caching method according to any one of claims 1 to 8, further comprising:
acquiring second spatial text data to be inquired;
extracting a second spatial position coordinate and a second text keyword in the second spatial text data;
inquiring a space index according to the second space position coordinate, and determining a second space code corresponding to the second space position coordinate;
querying a text index according to the second text keyword, and determining a second spatial coding set corresponding to the second text keyword;
determining a second target spatial code according to the intersection of the second spatial code and the second spatial code set;
determining a corresponding storage position of the spatial text data in Redis according to the second target spatial code;
and querying the second space text data in Redis according to the storage position, and returning a query result.
10. The spatial text data caching method according to any one of claims 1 to 8, further comprising:
acquiring an access heat value of each spatial code in a spatial index, comparing the access heat value with a preset first data access heat threshold, and deleting corresponding spatial text data if the access heat value is smaller than the preset first data access heat threshold;
or the like, or, alternatively,
and acquiring all spatial codes corresponding to each keyword in the text index, summing the access heat values of all spatial codes corresponding to each keyword, comparing the summation result with a preset second data access heat threshold, and deleting the corresponding spatial text data if the summation result is smaller than the preset second data access heat threshold.
11. A spatial text data caching apparatus, comprising:
the extraction module is used for extracting a first space position coordinate and a first text keyword from first space text data to be cached;
the first determining module is used for inquiring a space index according to the first space position coordinate and determining a first space code corresponding to the first space position coordinate; wherein, the spatial index stores the corresponding relation between each spatial position coordinate and the corresponding spatial code; the space coding refers to coding obtained by performing data compression on space position coordinates according to a preset coding rule;
a second determining module, configured to query a text index according to the first text keyword, and determine a first spatial coding set corresponding to the first text keyword; the text index stores the corresponding relation between each text keyword and a space code which has a space incidence relation with the text keyword;
a third determining module configured to determine a first target spatial encoding according to the first spatial encoding and the first set of spatial encodings;
a fourth determining module, configured to determine, according to the first target spatial code and the first text keyword, a corresponding storage location of the first spatial text data in Redis;
and the storage module is used for storing the first space text data into Redis according to the storage position.
12. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor implements the spatial text data caching method according to any one of claims 1 to 10 when executing the computer program.
13. A non-transitory computer-readable storage medium, on which a computer program is stored, wherein the computer program, when executed by a processor, implements the spatial text data caching method according to any one of claims 1 to 10.
CN202010158290.8A 2020-03-09 2020-03-09 Space text data caching processing method and device, electronic equipment and storage medium Active CN111353012B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010158290.8A CN111353012B (en) 2020-03-09 2020-03-09 Space text data caching processing method and device, electronic equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010158290.8A CN111353012B (en) 2020-03-09 2020-03-09 Space text data caching processing method and device, electronic equipment and storage medium

Publications (2)

Publication Number Publication Date
CN111353012A true CN111353012A (en) 2020-06-30
CN111353012B CN111353012B (en) 2023-10-17

Family

ID=71192623

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010158290.8A Active CN111353012B (en) 2020-03-09 2020-03-09 Space text data caching processing method and device, electronic equipment and storage medium

Country Status (1)

Country Link
CN (1) CN111353012B (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112115227A (en) * 2020-08-14 2020-12-22 咪咕文化科技有限公司 Data query method and device, electronic equipment and storage medium
CN112269947A (en) * 2020-09-23 2021-01-26 咪咕文化科技有限公司 Spatial text data caching method and device, electronic equipment and storage medium

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2001013069A1 (en) * 1999-08-12 2001-02-22 Kivera, Inc. Method and apparatus for providing location-dependent services to mobile users
US20090037403A1 (en) * 2007-07-31 2009-02-05 Microsoft Corporation Generalized location identification
US20120173540A1 (en) * 2010-12-29 2012-07-05 Sybase, Inc. Accelerating Database Queries Comprising Positional Text Conditions Plus Bitmap-Based Conditions
CN104376112A (en) * 2014-11-27 2015-02-25 苏州大学 Road network space keyword search method
CN104794123A (en) * 2014-01-20 2015-07-22 阿里巴巴集团控股有限公司 Method and device for establishing NoSQL database index for semi-structured data
CN107391636A (en) * 2017-07-10 2017-11-24 江苏省现代企业信息化应用支撑软件工程技术研发中心 The anti-neighbour's spatial key querying methods of top m
CN108052514A (en) * 2017-10-12 2018-05-18 南京航空航天大学 A kind of blending space Indexing Mechanism for handling geographical text Skyline inquiries
CN108846013A (en) * 2018-05-04 2018-11-20 昆明理工大学 A kind of spatial key word querying method and device based on geohash Yu Patricia Trie

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2001013069A1 (en) * 1999-08-12 2001-02-22 Kivera, Inc. Method and apparatus for providing location-dependent services to mobile users
US20090037403A1 (en) * 2007-07-31 2009-02-05 Microsoft Corporation Generalized location identification
US20120173540A1 (en) * 2010-12-29 2012-07-05 Sybase, Inc. Accelerating Database Queries Comprising Positional Text Conditions Plus Bitmap-Based Conditions
CN104794123A (en) * 2014-01-20 2015-07-22 阿里巴巴集团控股有限公司 Method and device for establishing NoSQL database index for semi-structured data
CN104376112A (en) * 2014-11-27 2015-02-25 苏州大学 Road network space keyword search method
CN107391636A (en) * 2017-07-10 2017-11-24 江苏省现代企业信息化应用支撑软件工程技术研发中心 The anti-neighbour's spatial key querying methods of top m
CN108052514A (en) * 2017-10-12 2018-05-18 南京航空航天大学 A kind of blending space Indexing Mechanism for handling geographical text Skyline inquiries
CN108846013A (en) * 2018-05-04 2018-11-20 昆明理工大学 A kind of spatial key word querying method and device based on geohash Yu Patricia Trie

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
李金良: "基于R树的空间—文本混合索引方法" *

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112115227A (en) * 2020-08-14 2020-12-22 咪咕文化科技有限公司 Data query method and device, electronic equipment and storage medium
CN112115227B (en) * 2020-08-14 2024-05-24 咪咕文化科技有限公司 Data query method and device, electronic equipment and storage medium
CN112269947A (en) * 2020-09-23 2021-01-26 咪咕文化科技有限公司 Spatial text data caching method and device, electronic equipment and storage medium

Also Published As

Publication number Publication date
CN111353012B (en) 2023-10-17

Similar Documents

Publication Publication Date Title
US8396883B2 (en) Spatial querying in a data warehouse
CN112115227B (en) Data query method and device, electronic equipment and storage medium
WO2017161540A1 (en) Data query method, data object storage method and data system
CN112800287B (en) Full-text indexing method and system based on graph database
CN110888837B (en) Object storage small file merging method and device
CN111353012A (en) Spatial text data caching method and device, electronic equipment and storage medium
US11995059B2 (en) Database index and database query processing method, apparatus, and device
CN112579602A (en) Multi-version data storage method and device, computer equipment and storage medium
CN112395288B (en) R-tree index merging and updating method, device and medium based on Hilbert curve
US20230385353A1 (en) Spatial search using key-value store
Song et al. Spatial join processing using corner transformation
CN112416880A (en) Method and device for optimizing storage performance of mass small files based on real-time merging
CN114691721A (en) Graph data query method and device, electronic equipment and storage medium
CN113704248B (en) Block chain query optimization method based on external index
Rammer et al. Atlas: A distributed file system for spatiotemporal data
Zhong et al. Elastic and effective spatio-temporal query processing scheme on hadoop
Wang et al. HBase storage schemas for massive spatial vector data
JP3938815B2 (en) Node creation method, image search method, and recording medium
Wang et al. Efficient spatial big data storage and query in HBase
An et al. Toward an accurate analysis of range queries on spatial data
Wang et al. Subspace k-anonymity algorithm for location-privacy preservation based on locality-sensitive hashing
US20240232157A1 (en) Database index and database query processing method, apparatus, and device
CN110489515B (en) Address book retrieval method, server and storage medium
Sun et al. An index with caching mechanism for real-time processing system
CN117762945A (en) Data storage method, data query method and related devices

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant