CN113763099A - Data searching method, device, equipment and storage medium - Google Patents

Data searching method, device, equipment and storage medium Download PDF

Info

Publication number
CN113763099A
CN113763099A CN202011598769.XA CN202011598769A CN113763099A CN 113763099 A CN113763099 A CN 113763099A CN 202011598769 A CN202011598769 A CN 202011598769A CN 113763099 A CN113763099 A CN 113763099A
Authority
CN
China
Prior art keywords
service data
data
target
geographic space
time period
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202011598769.XA
Other languages
Chinese (zh)
Inventor
隋远
李瑞远
鲍捷
胡建
谭楚婧
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Jingdong City Beijing Digital Technology Co Ltd
Original Assignee
Jingdong City Beijing Digital Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Jingdong City Beijing Digital Technology Co Ltd filed Critical Jingdong City Beijing Digital Technology Co Ltd
Priority to CN202011598769.XA priority Critical patent/CN113763099A/en
Publication of CN113763099A publication Critical patent/CN113763099A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/06Buying, selling or leasing transactions
    • G06Q30/0601Electronic shopping [e-shopping]
    • G06Q30/0633Lists, e.g. purchase orders, compilation or processing
    • G06Q30/0635Processing of requisition or of purchase orders
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/27Replication, distribution or synchronisation of data between databases or within a distributed database system; Distributed database system architectures therefor
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/901Indexing; Data structures therefor; Storage structures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9537Spatial or temporal dependent retrieval, e.g. spatiotemporal queries
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/06Buying, selling or leasing transactions
    • G06Q30/0601Electronic shopping [e-shopping]
    • G06Q30/0623Item investigation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/06Buying, selling or leasing transactions
    • G06Q30/0601Electronic shopping [e-shopping]
    • G06Q30/0639Item locations

Landscapes

  • Engineering & Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Accounting & Taxation (AREA)
  • Finance (AREA)
  • General Physics & Mathematics (AREA)
  • Physics & Mathematics (AREA)
  • General Business, Economics & Management (AREA)
  • Strategic Management (AREA)
  • Marketing (AREA)
  • Economics (AREA)
  • Development Economics (AREA)
  • Data Mining & Analysis (AREA)
  • General Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Computing Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The embodiment of the invention discloses a data searching method and device, electronic equipment and a storage medium, and relates to the technical field of computers. Wherein, the method comprises the following steps: receiving a data query request sent by a client, and acquiring a target position point and a target time period in the data query request; acquiring index information pre-established for the target time period; the index information comprises a plurality of geographic space ranges, and the quantity of the service data generated in each geographic space range in the target time period meets a preset condition; and determining a target geographic space range in which the target position point is located from the plurality of geographic space ranges, and searching the service data from a preset database by taking the target time period and the target geographic space range as searching conditions.

Description

Data searching method, device, equipment and storage medium
Technical Field
The embodiment of the invention relates to computer technology, in particular to a data searching method and device, electronic equipment and a storage medium.
Background
In recent years, online shopping has become an important lifestyle. The large e-commerce system provides convenient service for users and collects massive order data. How to mine useful information from such a huge amount of order data, and how to provide better services for users after reprocessing the order data has become a new challenge for the e-commerce industry.
At present, a user can quickly check commodity purchasing ranks of people around the user and purchasing power ranks of the user in the APP through the APP installed on a terminal.
In the process of implementing the invention, at least the following problems are found in the prior art:
in order to realize the above application, how to quickly find order data with proper quantity around a specified position point in a specified time period from hundreds of millions of order data is a problem to be solved at present.
Disclosure of Invention
The embodiment of the invention provides a data searching method, a data searching device, electronic equipment and a storage medium, which are used for quickly and accurately searching service data with proper quantity in a target time period and a target position point.
In a first aspect, an embodiment of the present invention provides a data searching method, where the method includes:
receiving a data query request sent by a client, and acquiring a target position point and a target time period in the data query request;
acquiring index information pre-established for the target time period; the index information comprises a plurality of geographic space ranges, and the quantity of the service data generated in each geographic space range in the target time period meets a preset condition;
and determining a target geographic space range in which the target position point is located from the plurality of geographic space ranges, and searching the service data from a preset database by taking the target time period and the target geographic space range as searching conditions.
In a second aspect, an embodiment of the present invention further provides a data searching apparatus, where the apparatus includes:
the target information acquisition module is used for receiving a data query request sent by a client and acquiring a target position point and a target time period in the data query request;
the index information acquisition module is used for acquiring index information which is pre-established aiming at the target time period; the index information comprises a plurality of geographic space ranges, and the quantity of the service data generated in each geographic space range in the target time period meets a preset condition;
and the first business data searching module is used for determining a target geographic space range in which the target position point is located from the plurality of geographic space ranges, and searching business data from a preset database by taking the target time period and the target geographic space range as searching conditions.
In a third aspect, an embodiment of the present invention further provides an electronic device, where the electronic device includes:
one or more processors;
storage means for storing one or more programs;
when the one or more programs are executed by the one or more processors, the one or more processors implement the data lookup method according to any one of the embodiments of the present invention.
In a fourth aspect, embodiments of the present invention further provide a storage medium containing computer-executable instructions, which when executed by a computer processor, are configured to perform the data lookup method according to any one of the embodiments of the present invention.
According to the technical scheme, the target geographic space range in which the target position point in the data query request falls is determined by utilizing the pre-established index information, so that when the target geographic space range and the target time period in the data query request are used as query conditions to search the service data in the preset database, only the service data of which the space-time index falls in the target geographic space range and the target time period need to be searched, the service data of which the space-time index does not fall in the target geographic space range and the target time period do not need to be searched, the searching speed is high, and the service data with the appropriate quantity in the target time period and the target position point can be quickly and accurately searched because the quantity of the service data generated in the target geographic space range meets the preset conditions.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, a brief description will be given below of the drawings required for the embodiments or the technical solutions in the prior art, and it is obvious that the drawings in the following description are some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to these drawings without creative efforts.
FIG. 1 is a schematic diagram of a KNN algorithm in the prior art;
FIG. 2 is a schematic diagram of the prior art GiST index;
FIG. 3(a) is a diagram of dimension information for three dimensions in the Z3 index in the prior art;
FIG. 3(b) is a diagram illustrating a prior art binary encoding of the time dimension corresponding to the Z3 index;
FIG. 3(c) is a diagram of cross-coding the time dimension code and the dimension code in the Z3 index in the prior art;
fig. 4 is a schematic diagram of service data distribution in various regions in the prior art;
FIG. 5 is a flowchart of a data searching method according to an embodiment of the present invention;
FIG. 6 is a flowchart of a data query method according to a second embodiment of the present invention;
fig. 7 is a schematic diagram of a data flow of a sub-architecture for implementing a data query method according to a second embodiment of the present invention;
fig. 8 is a schematic view of an index information creating process according to a second embodiment of the present invention;
FIG. 9 is a diagram of a base index tree according to a second embodiment of the present invention;
FIG. 10 is a diagram illustrating the construction of index information according to the second embodiment of the present invention;
FIG. 11 is a diagram illustrating a data retrieval process according to a second embodiment of the present invention;
fig. 12 is a schematic diagram of ranking sales obtained by performing statistical analysis on order data obtained by query according to a second embodiment of the present invention;
FIG. 13 is a flowchart of a data query method according to a third embodiment of the present invention;
fig. 14 is a schematic structural diagram of a data searching apparatus according to a fourth embodiment of the present invention;
fig. 15 is a schematic structural diagram of an electronic device according to a fifth embodiment of the present invention.
Detailed Description
Before introducing the technical solution of the embodiment of the present invention, first, a solution adopted when searching for service data in the prior art is introduced:
in the prior art, there are two schemes for searching service data, which are specifically as follows:
(1) nearest node algorithm (KNN) query scheme based on relational spatial database
As is well known to those skilled in the art, a Spatial database is a database system that provides Spatial data types in its internal data model and query language, and provides Spatial indexing and overlay analysis Spatial Join methods.
At present, relational databases supporting Spatial data storage and Spatial query are very common, such as the business software Oracle Spatial, open source software PostGIS, and the like. The software is built with a very comprehensive spatial data model and rich spatial index and query methods. Through research on the two types of software, the efficiency performance of the PostGIS in the aspects of access and retrieval is better than that of Oracle Spatial, the efficiency of the PostGIS is at least 50% different from that of the PostGIS after all queries are compared, and the execution efficiency of the PostGIS is even 450% faster in some scenes. Although PostGIS may occupy more computing resources in some scenarios, it may be negligible in comparison to the efficiency of revenue. To fulfill the requirements for data query, two steps need to be performed using the PostGIS:
(a) a sufficient amount of traffic data (which may be order data, for example) around a given location (say 5000) is queried by the KNN algorithm. This process may be referred to as primary filtering.
As is well known to those skilled in the art, KNN is a nearest neighbor algorithm that can query the nearest K other geometries around it based on a given spatial location, as shown in fig. 1.
In the prior art, the nearest 5000 orders to a given location point are found using KNN. By establishing a GiST index inside the PostGIS, as shown in fig. 2, efficient retrieval of KNN can be realized.
(b) And performing time filtering on the query result. And filtering 5000 pieces of service data queried by the KNN according to the time field, wherein the time field can be the last 7 days, and thus, the final service data of the last 7 days can be obtained. This process may be referred to as secondary filtration. And indexes are established for time fields in the PostGIS, so that the final result can be quickly screened out.
The above process is a scheme for performing close order query within 7 days by using the PostGIS, but the above scheme has the following two defects:
and I, one filtering is pure space filtering, and the obtained candidate service data is probably not in the target time. For example, a KNN query results in 5000 orders nearby, which may all or mostly occur 7 days ago, resulting in very few results after the second filtering.
There are two improvements to this situation. First, the K value of the first filtering can be amplified, for example, by querying about 50000 orders, so that the remaining data amount of the second filtering is likely to increase, but this solution still cannot guarantee that enough data can be queried. At the same time, increasing the K value also causes a decrease in performance. And secondly, performing time filtering and KNN. However, as the order data may have tens of millions or more in 7 days, as a result of time filtering, performing KNN in memory has significant performance overhead and efficiency problems. This solution is too inefficient in primary filtration.
And II, PostGIS is used as a relational database, and the horizontal expansibility is naturally limited, although external components such as pgpool and pg-xl support a cluster mode. However, the writing of tens of millions of service data volumes per day can overload the PostGIS.
(2) Data query based on distributed database and spatio-temporal index
As is well known to those skilled in the art, GeoMesa is a massive spatio-temporal data retrieval and processing engine based on a distributed database, which does not store data itself, and the data is stored in various data sources adapted to the GeoMesa, such as HBase. The GeoMesa may build the spatio-temporal index based on the characteristics of the different data sources.
According to the scheme, data query is carried out by utilizing GeoMesa, performance problems caused by overlarge data volume can be avoided due to the distributed characteristic of GeoMesa, and the scheme mainly utilizes Z3 index of GeoMesa to carry out space-time filtering.
As is well known to those skilled in the art, the Z3 index is a three-dimensional representation of the Z-curve of the space-filling curve. It expands the spatial Z2 curve. The specific expansion mode is as follows: firstly, partitioning time according to a certain granularity, then carrying out binary coding on the time to obtain time dimension codes, and finally carrying out cross coding on the time dimension and the longitude and latitude to obtain a key value of a final database index, thereby realizing the effect of space-time index.
Referring briefly to the Z3 index, as shown in fig. 3(a), Time is a Time dimension, ng is a longitude dimension, and lat is a latitude dimension, and first, the Time dimension is partitioned into blocks according to a certain Time Period (Time Period). And then carrying out binary coding on the time to obtain a time dimension code. Specifically, as shown in fig. 3(b), for example, 10 o ' clock selected at the target time point, that is, the service data of 10 o ' clock to be queried, is divided into two equal parts by taking 12 o ' clock as a boundary in 24 hours a day, wherein the codes from 0 o ' clock to 12 o ' clock are "0", and the codes from 12 o ' clock to 24 o ' clock are "1". When the target time point 10 falls between 0 point and 12 points, the code of the target time point is "0", and then the 0 point to 12 points are divided into two equal parts by taking 6 points as a boundary line, wherein the code of the 0 point to 6 points is "0", and the code of the 6 points to 12 points is "1". When the target time point 10 falls between 6 points and 12 points, the code of the target time point is "1", and then the 6 points and 12 points are divided into two equal parts by taking 9 points as a boundary line, wherein the code of the 6 points to 9 points is "0", and the code of the 9 points to 12 points is "1". The target time point 10 falls between 9 and 12 points, and the code of the target time point is "1". And the like until the codes are divided until the target time point cannot be determined. For example, in the above example, 10 points fall between 9 points and 12 points, and when 9 points and 12 points are equally divided, the division is performed with 10 points as a boundary, the code of 9 points to 10 points is "0", the code of 10 points to 12 points is "1", and the time division is stopped at this point because the 10 points are boundary points, and thus the code of 10 points at the target time point cannot be determined. Time (t) codes "0, 1" as in fig. 3(b) are obtained.
By adopting the method for determining the time code, longitude (lng) code and latitude (lat) code can be determined, the time dimension and the latitude and the longitude are subjected to cross coding, as shown in fig. 3(c), a key value of a final database index can be obtained, business data corresponding to the cross coding is obtained according to the obtained cross coding, and the obtained business data is used as a value of the final database index.
After order data are imported by using a GeoMesa engine, a space-time Z3 index is established according to the modes in the figures 3(a) to 3(c), and space-time filtering can be realized through one-step query.
However, the above solution (2) has a significant drawback that the spatial range of the query cannot be determined. Because, GeoMesa's spatiotemporal query must specify the spatial and temporal extent of the query. The time range can be determined by the last 7 days, the spatial range is because the order distribution of each area is very uneven, the order quantity of Beijing and east coastal areas shown in FIG. 4 is very dense, the middle area is relatively small, the west area is very rare (if the area has a dot in FIG. 4, it indicates that there is an order in the area), and if a fixed spatial range is used, two situations are very likely to occur, the first is that some areas inquire very much data, such as 10w orders, which causes the data quantity transmitted back to the client to be very large, increases the pressure of the client, and finally causes the overall performance to be reduced; another situation is that some areas may not have enough orders to pick up because the order size is so small. Therefore, a fixed spatial range solution is not feasible.
The technical scheme of the embodiment of the invention can quickly retrieve the spatio-temporal adjacent service data aiming at given time range, space range and quantity factors, and simultaneously ensure that the data volume of the returned service data is enough to provide statistical calculation without reducing the query efficiency due to excessive data.
The present invention will be described in further detail with reference to the accompanying drawings and examples. It is to be understood that the specific embodiments described herein are merely illustrative of the invention and are not limiting of the invention. It should be further noted that, for the convenience of description, only some of the structures related to the present invention are shown in the drawings, not all of the structures.
Example one
Fig. 5 is a flowchart of a data searching method according to an embodiment of the present invention, which is applicable to a case of searching service data, and particularly, the present embodiment may be used in a scenario where service data with a proper amount around a specified location point in a specified time period is searched in hundreds of millions of service data, and may also be used in other application scenarios where service data needs to be searched. The method can be executed by a data search device, the data search device can be realized by software and/or hardware, the data search device can be configured on a computing electronic device, and the method specifically comprises the following steps:
s110, receiving a data query request sent by a client, and acquiring a target position point and a target time period in the data query request.
The client may be an application installed on a mobile phone, a tablet computer or a PC for sending a data query request.
The data query request may be a request to query any business data.
In the embodiment of the present invention, the data query request may be input from the outside and then sent by the client, specifically, for example, the data query request may be generated by the client in response to a trigger operation of an outside user on the client, for example, the data query request may be generated by the client in response to a trigger operation of a user clicking a button of a "data query request" in the client.
The target location point may be a location point where the user is currently located when the user sends a data query request, or may also be a location point selected by the user in a user-defined manner. For example, if a user wants to query data, the current location of the user is the target location point, for example, the current location of a user is a sunward cell in the hai lake area of beijing. The user wants to query his own ranking of a purchased item in his/her area (which may be, for example, the entire lake area), and the target location point may be the sunward cell of the lake area.
The target time period may be a time period to which the data to be queried belongs, that is, data of which time period to be queried. For example, if the user wants to query the business data of the hai lake district of beijing for the last 7 days, the target time period here may be the last 7 days.
In the embodiment of the invention, the data query request sent by the client carries the target location point and the target time period of the data to be queried.
S120, acquiring index information pre-established for a target time period; the index information comprises a plurality of geographic space ranges, and the quantity of the generated service data in each geographic space range in the target time period meets a preset condition.
For example, after the target time period is obtained, index information pre-established for the target time period may be obtained.
In the embodiment of the invention, the index information comprises a plurality of geographic space ranges, and the quantity of the service data generated in each geographic space range in the target time period meets the preset condition.
The geospatial range may be a preset geospatial range, for example, each city in each province may be used as one geospatial range, and specifically, taking the north river province as an example, each city in the north river province may be used as one geospatial range.
The preset condition may be a preset condition. Specifically, the amount of the service data generated in each geographic space range in the target time period does not exceed the preset threshold, the amount of the service data generated in each geographic space range in the target time period is equal to the preset threshold, or the amount of the service data generated in each geographic space range in the target time period exceeds the preset threshold.
In the embodiment of the present invention, the business data may be order data.
Further, in the case that the amount of the service data generated in each geographic space range in the target time period does not exceed the preset threshold, and in the case that the amount of the service data generated in each city possibly exceeds the preset threshold, the city may be subdivided, for example, each county in the city may be regarded as the geographic space range.
In the embodiment of the present invention, the pre-established index information may exist in the form of Key value pairs, where the Key value may be an encoding of a node, and the value may be a corresponding geospatial range.
S130, determining a target geographic space range where the target position point is located from the multiple geographic space ranges, and searching the service data from a preset database by taking the target time period and the target geographic space range as searching conditions.
For example, the target geospatial range may be a geospatial range in which the target location point is located, as determined from the target location point. For example, a user who wants to search for data is currently located in the sunward cell of the hai lake district of beijing. The user would like to query the rank of a product purchased by the user in the region where the product is located (which may be the entire lake area, for example), and the target geospatial range may be the lake area.
The preset database may be a preset database, and the service data in each geographic space range in each time period is stored in the database.
Optionally, the preset database may be a distributed database; the service data stored in the distributed database is provided with a space-time index, and the space-time index is obtained by cross coding position information and time information of the corresponding service data.
In the embodiment of the present invention, the preset database may specifically be a city Spatio-Temporal Data Engine (JD rban spatial-Temporal Data Engine, JUST), and the Spatio-Temporal index carried by the service Data stored in the city Spatio-Temporal Data Engine may be obtained by cross-coding the position information and the time information of the corresponding service Data in the manner shown in fig. 3(a) -3 (c).
In the embodiment of the invention, the urban spatio-temporal data engine JUST adopts advanced distributed storage, index and calculation technology, and a high-availability and extensible spatio-temporal data management and mining platform is built. JUST supports access and processing of streaming data, and meanwhile, by means of an optimized efficient indexing mechanism, retrieval and query of service data can be rapidly carried out in mass data according to space-time conditions.
In the embodiment of the invention, the spatio-temporal index is a national query region index established by means of dynamic statistics, and each user query can obtain a reasonable spatial retrieval range according to the region index, thereby ensuring the reasonability of a final query result.
After the index information pre-established for the target time period is acquired, the target geographic space range where the target position point is located can be determined from the multiple geographic space ranges according to the target position point and based on the index information, and the target time period and the target geographic space range are used as searching conditions to search the service data from the preset database.
Optionally, the searching for the service data from the preset database by using the target time period and the target geographic space range as the searching conditions may specifically be: and searching the service data of which the spatio-temporal index falls into the target time period and the target geographic space range from a preset database.
Illustratively, the business data of which the spatio-temporal index falls into the target time period and the target geographic space range are searched from a preset database by taking the target time period and the target geographic space range as search conditions.
Specifically, for example, a user who wants to search for data is currently located in a sunward cell in the hai lake area of beijing. The user wants to query the rank of the purchased article a in the last 7 days of the area (which may be the whole lake area, for example), and the target geospatial range may be the lake area. Then, according to the target time period (within the last 7 days) and the target geographic spatial range (the haih area), as the search condition, the order data of purchasing the a commodities in the haih area within the last 7 days can be obtained from the preset database, and according to the order data, the number of purchasing the a commodities in the haih area within the last 7 days, that is, how many a commodities are purchased in the haih area within the last 7 days.
The technical scheme of the embodiment of the invention determines the target geographic space range of the target position point from the pre-established index information aiming at the target time quantum according to the target position point and the target time quantum in the data query request, and can search the service data from the preset database by taking the target time quantum and the target geographic space range as the search conditions, thus determining the target geographic space range of the target position point in the data query request by utilizing the pre-established index information of the geographic space range which can contain the service data quantity and meet the preset conditions, and only searching the service data of which the spatio-temporal index falls into the target geographic space range and the target time quantum when the target geographic space range and the target time quantum in the data query request are taken as the query conditions to search the service data in the preset database without searching the service data of which the spatio-temporal index does not fall into the target geographic space range and the target time quantum, the searching speed is high, and the service data with the appropriate quantity in the target time period and the target position point can be quickly and accurately searched because the quantity of the service data generated in the target geographic space range meets the preset condition.
Example two
Fig. 6 is a flowchart of a data query method according to a second embodiment of the present invention, and the second embodiment of the present invention may be combined with various alternatives in the foregoing embodiments. On the basis of the foregoing embodiments, preferably before the obtaining of the index information pre-established for the target time period, the method further includes: acquiring service data generated in a target time period, and generating a service data set containing all the acquired service data or part of the acquired service data; determining a geographic space range within which the quantity of the generated service data belonging to the service data set does not exceed a preset threshold; and generating and storing index information containing the determined geographic space ranges.
Referring to fig. 7, a schematic diagram of a data flow of a sub-architecture of the method for implementing data query is shown, and in fig. 7, the method is totally divided into two flows, namely an index information establishing flow and a data retrieval flow. Wherein the arrows between the historian and the index repository represent the process of building the spatiotemporal index. An arrow from the client to a preset database (which may be a JUST specifically) is a retrieval process.
In fig. 7, the service data generated every day is stored in a preset database, that is, the process of importing the historical library service data in fig. 7 into the preset database (which may be a JUST start).
The following embodiments of the present invention will describe an index establishment procedure and a data retrieval procedure (result service data returned to a client by a preset database (which may be a JUST start).
Wherein explanations of the same or corresponding terms as those of the above embodiments are omitted. As shown in fig. 6, the method of the embodiment of the present invention specifically includes the following steps:
s210, acquiring the service data generated in the target time period, and generating a service data set containing all the acquired service data or part of the acquired service data.
Illustratively, referring to the index information establishing flow diagram described in fig. 8, the service data generated in the target time period is acquired, and a service data set including all the acquired service data or a part of the acquired service data is generated.
Specifically, for example, taking the target time period as the last 7 days and the service data as the order data, all the order data in the last 7 days are acquired, for example, 100 ten thousand orders are generated in the whole country in the last 7 days, and a service data set including all the acquired service data or part of the acquired service data can be generated according to the 100 ten thousand orders. That is, one set (i.e., a service data set including all the acquired service data) may be generated from the acquired 100 ten thousand orders, or 50 ten thousand orders of the 100 ten thousand orders may be used as one set (i.e., a service data set including part of the acquired service data).
Optionally, the part of the service data includes service data sampled from all the acquired service data in a preset ratio.
Illustratively, the preset ratio of this may be a preset ratio, for example, may be 1%.
For example, continuing the above example, 1% of order data may be sampled from all 100 ten thousand of order data acquired in the last 7 days, and a service data set of partial service data may be generated from the sampled 1% of order data (i.e., 10 ten thousand of order data).
The advantage of sampling the service data forming part of the service data with the preset proportion from all the service data is that the construction of the time-space index can be completed quickly, and meanwhile, the excessive loss of precision is avoided. The reason is that the service data volume of 7 days can reach hundreds of millions, and the calculation of all the service data stored in the memory is very time-consuming, so that the service data with the preset proportion is sampled, the calculation of the memory can be reduced, and the calculation burden of the memory is lightened.
S220, determining that the quantity of the generated service data belonging to the service data set does not exceed the geographic space range of a preset threshold value.
For example, the preset threshold may be a preset threshold of the amount of generated traffic data belonging to the traffic data set.
When the service data set is determined, the geographic space range in which the quantity of the generated service data belonging to the service data set does not exceed the preset threshold value can be determined.
In the embodiment of the present invention, if the service data set is a service data set of all service data, it is determined that the number of generated service data belonging to the service data set does not exceed the geographic space range of the preset threshold, that is, it is determined that the number of generated service data belonging to the service data set of all service data does not exceed the geographic space range of the preset threshold.
Specifically, for example, if the total number of the service data is 100 ten thousand and the preset threshold is 10, it is determined that the number of the generated service data belonging to the 100 ten thousand service data does not exceed the geographic space range of 10.
In the embodiment of the present invention, if the service data set is a service data set of partial service data, for example, a service data set of service data which is 1% of all service data, it is determined that the amount of the generated service data which belongs to the service data set does not exceed the geospatial range of the preset threshold, that is, it is determined that the amount of the generated service data which belongs to the service data set of 1% of all service data does not exceed the geospatial range of the preset threshold.
Specifically, for example, if all the service data is 100 ten thousand, and the preset proportion is 1%, the service data set is 10 ten thousand service data, and the preset threshold is 10, it is determined that the number of the generated service data belonging to the 10 ten thousand service data does not exceed the geographic space range of 10.
Optionally, the determining that the amount of the generated service data belonging to the service data set does not exceed the geographic spatial range of the preset threshold includes: acquiring a pre-established basic index tree; the root node of the basic index tree represents a preset maximum geographic spatial range, and other nodes represent sub-geographic spatial ranges obtained by dividing the geographic spatial ranges of father nodes of the other nodes; determining the quantity of the service data falling into the geographic space range of the current leaf node in the service data set aiming at each leaf node in the basic index tree, if the quantity is larger than a preset threshold value, dividing the geographic space range of the current leaf node to obtain sub-nodes of the current leaf node until no leaf node exists, wherein the quantity of the service data generated in the corresponding geographic space range is larger than the preset threshold value; and determining the geographic spatial range corresponding to each leaf node as the geographic spatial range in which the quantity of the generated service data belonging to the service data set does not exceed a preset threshold value.
For example, the base index tree may be a pre-established index tree, and fig. 9 is a schematic diagram of the base index tree. The base index tree is a balanced N-ary tree, where N is an integer no less than 2.
As shown in fig. 9, if fig. 9 is a basic index tree of the whole china, the uppermost L1 level is the root node of the basic index tree, which represents the preset maximum geospatial range, i.e., china.
The other nodes represent sub-geospatial ranges obtained by dividing the geospatial range of the parent node of the other nodes. As shown in fig. 9, if fig. 9 is a basic index tree of the whole china, the top L1 level is the root node of the basic index tree, which represents the preset maximum geospatial range, i.e. china. The other nodes, for example, 4 nodes in the L2 level in fig. 9 represent sub-geospatial ranges obtained by dividing the geospatial range of the parent node (the root node of china in the L1 level) of the 4 nodes in the L2 level, for example, the sub-geospatial ranges obtained by dividing china may be the sub-geospatial ranges obtained by dividing china, and for example, the 4 nodes in the L2 level may be the a region of china, the B region of china, the C region of china, and the D region of china, respectively.
And aiming at each leaf node in the basic index tree, determining the quantity of the service data in the service data set, which fall into the geographic space range of the current leaf node, if the quantity is greater than a preset threshold value, dividing the geographic space range of the current leaf node to obtain sub-nodes of the current leaf node until no leaf node exists, which has the quantity of the service data generated in the corresponding geographic space range greater than the preset threshold value. Determining the geographic spatial range corresponding to each leaf node as the geographic spatial range in which the quantity of the generated service data belonging to the service data set does not exceed the preset threshold.
Optionally, a KD (K-Dimensional, K-Dimensional space) tree of a spatial dimension is established for the business data in the business data set.
Correspondingly, determining the quantity of the service data in the service data set, which falls into the geographic spatial range of the current leaf node, includes: and querying the KD tree by taking the geographic spatial range of the current leaf node as a query condition, obtaining the service data which falls into the geographic spatial range of the current leaf node in the service data set, and calculating the quantity of the service data obtained by querying.
Illustratively, the KD-tree herein is a data structure that segments a k-dimensional data space. The KD-tree is a binary tree with each node representing a spatial range with traffic data within that spatial range within each spatial range.
When the number of the service data falling into the geographic spatial range of the current leaf node in the service data set is determined, specifically, the geographic spatial range of the current leaf node is used as an inquiry condition to inquire the KD tree, the service data falling into the geographic spatial range of the current leaf node in the service data set is obtained, and the number of the service data obtained through inquiry can be calculated according to the obtained service data falling into the geographic spatial range of the current leaf node.
Specifically, for example, referring to the index information construction diagram shown in fig. 10, the preset threshold is set to be 5. The number of traffic data whose traffic data set falls within the geospatial range of each leaf node on level L2, which is found by investigation on the basis of the circles on level L2 in fig. 10 for each leaf node in the index tree, is shown by the numbers in each leaf node on level L2 in fig. 10. If the traffic data amount in the current leaf node is greater than the preset threshold 5, the geospatial range of the current leaf node is divided to obtain the sub-nodes of the current leaf node, for example, in fig. 10, if the traffic data amount 18 in the leftmost leaf node in the L2 hierarchy is greater than the preset threshold 5, the leftmost leaf node in the L2 hierarchy is divided to obtain the 4 left sub-nodes in the L3 hierarchy.
Determining the number of the service data in the leaf node in each hierarchy, if the number is greater than a preset threshold, dividing the leaf node into sub-nodes until the number of the service data in the leaf node is not greater than the preset threshold, for example, the number of the service data in each leaf node in fig. 10 is less than or equal to a preset threshold 5.
Determining the geographic spatial range corresponding to each leaf node in fig. 10 where the quantity of the service data is less than or equal to the preset threshold as the geographic spatial range where the quantity of the generated service data belonging to the service data set does not exceed the preset threshold. That is, the geospatial ranges corresponding to 2 leaf nodes in the L2 level, 7 leaf nodes in the L3 level, and 4 leaf nodes in the L4 level in fig. 11 are respectively used as the geospatial ranges where the amount of generated service data belonging to the service data set does not exceed the preset threshold.
The method has the advantages that the index information of the geographic space range with the service data quantity not exceeding the preset threshold value can be determined in advance, so that when a data query request is received subsequently, the target geographic space range in which the target position point in the data query request falls can be determined according to the pre-established index information, then when the service data is searched in the preset database by taking the target geographic space range and the target time period in the data query request as query conditions, only the service data with the space-time index falling into the target geographic space range and the target time period is searched, the service data with the space-time index not falling into the target geographic space range and the target time period is not required to be searched, the searching speed is high, and as the quantity of the service data generated in the target geographic space range does not exceed the preset threshold value, excessive service data cannot be searched, on the basis of guaranteeing the reasonability of the quantity of the searched business data, the calculation burden of the memory is not increased.
And S230, generating and storing index information containing the determined geographic space ranges.
Illustratively, after determining the geographic space range in which the quantity of the generated service data belonging to the service data set does not exceed the preset threshold, generating and storing index information containing the determined geographic space ranges.
In the embodiment of the present invention, when storing each leaf node in the basic index tree, the storing may specifically be performed in a form of key value pairs, where key represents an encoding of the leaf node, and value represents a geospatial range corresponding to the leaf node.
Thus, index information of each geographic space range is generated, so that a target geographic space range where the target position point is located is determined from the multiple geographic space ranges subsequently based on the index information.
S240, receiving a data query request sent by the client, and acquiring a target position point and a target time period in the data query request.
S250, acquiring index information pre-established for a target time period;
the index information includes a plurality of geospatial ranges, and the amount of the service data generated in each geospatial range in the target time period satisfies a preset condition, such as not greater than a preset threshold.
S260, determining a target geographic space range where the target position point is located from the multiple geographic space ranges, and searching the service data from a preset database by taking the target time period and the target geographic space range as searching conditions.
Optionally, referring to the schematic diagram of the data retrieval process described in fig. 11, first, a target location point and a target time period in the data query request are obtained, index information pre-established for the target time period is obtained, a target geospatial range where the target location point is located is determined from a plurality of geospatial ranges in the index information, then the target time period and the target geospatial range are used as query conditions to query the service data in the preset database, so as to obtain the required service data, and further, the number of the required service data can be calculated.
In the embodiment of the invention, the technical scheme implemented by the invention is to acquire a small amount of service data, thereby determining the service data required by the user, specifically referring to the sales ranking diagram obtained by performing statistical analysis on the service data obtained by inquiry shown in fig. 12, in fig. 12, if a user wants to query the names of the first few items that he purchased in the geospatial area of the last 7 days, and the ranking of the items that he purchased, it is necessary to first obtain order data within the geospatial range of the user for the last 7 days, then, the obtained order data is subjected to statistical analysis to obtain the names of the first few commodities of the commodities purchased by the user in the geographical space range of the user in the last 7 days and the ranking condition of the products purchased by the user in the geographical space range of the user. The quantity of the order data acquired by the technical scheme of the embodiment of the invention is not too much, so that the reasonable query result is ensured to be obtained, and the calculation burden of the memory is not increased.
It should be noted that the technical solution of the embodiment of the present invention is to acquire a small amount of service data, so as to determine the service data required by the user. In addition to obtaining a small amount of service data, the service data may also be obtained with a quantity greater than a certain threshold, that is, the service data with a larger quantity is to be obtained, so that it needs to be ensured that the quantity of the service data in the geographic space range corresponding to each leaf node in the index tree established in fig. 10 in the embodiment of the present invention exceeds a certain threshold, so that it can be ensured that the quantity of the service data in the geographic space range represented by each leaf node is large, and thus when the quantity of the service data is obtained, enough service data can be obtained.
Of course, the specific data amount to be acquired is an appropriate data amount, an excessive data amount, and the like, and the specific data amount may be set according to the needs of the user, and is not limited herein.
According to the technical scheme of the embodiment of the invention, the geographic space range of which the quantity of the generated business data does not exceed the preset threshold value is determined by generating the business data set containing all or part of the business data generated in the acquired target time period, and the geographic space range of which the quantity of the business data does not exceed the preset threshold value is determined, so that the index information containing the determined geographic space ranges is generated and stored, so that when the business data is searched in each geographic space range according to the target geographic space range, excessive business data cannot be searched, and the calculation burden of a memory cannot be increased on the basis of ensuring the reasonability of the business data.
EXAMPLE III
Fig. 13 is a flowchart of a data query method provided in the third embodiment of the present invention, and the third embodiment of the present invention may be combined with various alternatives in the foregoing embodiments. On the basis of the foregoing embodiments, preferably after step S260, the method further includes: and if the service data is not searched from the preset database according to the searching conditions, determining other geographic space ranges adjacent to the target geographic space range from the multiple geographic space ranges, and searching the service data from the preset database by taking the target time period and the other geographic space ranges as the searching conditions.
Wherein explanations of the same or corresponding terms as those of the above embodiments are omitted. As shown in fig. 13, the method of the embodiment of the present invention specifically includes the following steps:
s310, acquiring the service data generated in the target time period, and generating a service data set containing all the acquired service data or part of the acquired service data.
And S320, determining that the quantity of the generated service data belonging to the service data set does not exceed the geographic space range of a preset threshold value.
And S330, generating and storing index information containing the determined geographic space ranges.
S340, receiving a data query request sent by a client, and acquiring a target position point and a target time period in the data query request.
S350, acquiring index information pre-established for a target time period; the index information comprises a plurality of geographic space ranges, and the quantity of the generated service data in each geographic space range in the target time period meets a preset condition.
S360, determining a target geographic space range where the target position point is located from the multiple geographic space ranges, and searching the service data from a preset database by taking the target time period and the target geographic space range as searching conditions.
And S370, if the service data is not searched from the preset database according to the searching conditions, determining other geographic space ranges adjacent to the target geographic space range from the multiple geographic space ranges, and searching the service data from the preset database by taking the target time period and the other geographic space ranges as the searching conditions.
For example, the other geospatial range adjacent to the target geospatial range may be a geospatial range adjacent to the target geospatial range. As shown in fig. 10, if the geospatial range of the 8 th leaf node in the L3 level is determined to be the target geospatial range, the geospatial range of the 7 th leaf node in the L3 level may be another geospatial range adjacent to the target geospatial range.
Specifically, for example, if the service data is not searched from the preset database according to the search condition, for example, as shown in fig. 10, it is determined that the geospatial range of the 8 th leaf node in the L3 hierarchy is the target geospatial range, when the service data is searched from the preset database, no service data exists in the 8 th leaf node in the L3 hierarchy, that is, the service data is not searched by the 8 th leaf node in the L3 hierarchy, the geospatial range of the 7 th leaf node in the L3 hierarchy is determined as another geospatial range adjacent to the target geospatial range, and then the service data is searched from the preset database by using the target time period and the other geospatial range as the search condition, so that the required service data can be searched from the geospatial range of the 7 th leaf node in the L3 hierarchy.
The advantage of this arrangement is that the problem that the required service data cannot be searched when no corresponding service data exists in the determined target geospatial range is avoided.
According to the technical scheme of the embodiment of the invention, when the business data is not searched from the preset database according to the searching condition, other geographic space ranges adjacent to the target geographic space range can be determined from the multiple geographic space ranges, and the target time period and other geographic space ranges are used as the searching condition to search the business data from the preset database, so that the problem that the needed business data cannot be searched when the corresponding business data does not exist in the determined target geographic space range is avoided.
The following is an embodiment of the data searching apparatus provided in the embodiments of the present invention, and the apparatus and the data searching method in the embodiments belong to the same inventive concept, and details that are not described in detail in the embodiments of the data searching apparatus may refer to the embodiments of the data searching method.
Example four
Fig. 14 is a schematic structural diagram of a data search apparatus according to a fourth embodiment of the present invention, where this embodiment is applicable to a case of querying service data, and as shown in fig. 14, the data search apparatus includes: a target information obtaining module 300, an index information obtaining module 400 and a first service data searching module 500.
The target information acquiring module 300 is configured to receive a data query request sent by a client, and acquire a target location point and a target time period in the data query request;
an index information obtaining module 400, configured to obtain index information pre-established for the target time period; the index information comprises a plurality of geographic space ranges, and the quantity of the service data generated in each geographic space range in the target time period meets a preset condition;
the first service data searching module 500 is configured to determine a target geospatial range where the target location point is located from the multiple geospatial ranges, and search service data from a preset database by using the target time period and the target geospatial range as search conditions.
On the basis of the technical scheme of the embodiment of the invention, the device also comprises:
a service data set generating module, configured to acquire service data generated in the target time period, and generate a service data set including all or part of the acquired service data;
the first geographic space range determining module is used for determining that the quantity of the generated service data belonging to the service data set does not exceed the geographic space range of a preset threshold value;
and the index information generating module is used for generating and storing the index information containing the determined geographic space ranges.
Optionally, the part of the service data includes service data sampled from all the acquired service data in a preset ratio.
On the basis of the technical scheme of the embodiment of the invention, the first geographic space range determining module comprises:
a basic index tree obtaining unit, configured to obtain a basic index tree that is established in advance; the root node of the basic index tree represents a preset maximum geographic spatial range, and other nodes represent sub-geographic spatial ranges obtained by dividing geographic spatial ranges of father nodes of other nodes;
a quantity determining unit of the service data, configured to determine, for each leaf node in the basic index tree, a quantity of the service data in the service data set that falls within a geographic spatial range of a current leaf node, and if the quantity is greater than a preset threshold, divide the geographic spatial range of the current leaf node to obtain sub-nodes of the current leaf node until there is no leaf node whose quantity of the service data generated in the corresponding geographic spatial range is greater than the preset threshold;
and the first geographic space range determining unit is used for determining the geographic space range corresponding to each leaf node as the geographic space range in which the quantity of the generated service data belonging to the service data set does not exceed a preset threshold value.
Optionally, the base index tree is a balanced N-ary tree, where N is an integer not less than 2.
On the basis of the technical scheme of the embodiment of the invention, the device also comprises:
and the KD tree establishing module is used for establishing a KD tree of a spatial dimension aiming at the service data in the service data set.
Correspondingly, on the basis of the technical solution of the embodiment of the present invention, the unit for determining the amount of the service data includes:
and the quantity determining subunit is used for querying the KD tree by taking the geographic spatial range of the current leaf node as a query condition, obtaining the business data which falls into the geographic spatial range of the current leaf node in the business data set, and calculating the quantity of the business data obtained by query.
Optionally, the preset database is a distributed database; the service data stored in the distributed database is provided with a space-time index, and the space-time index is obtained by cross coding position information and time information of the corresponding service data.
On the basis of the technical solution of the embodiment of the present invention, the first service data searching module 500 includes:
and the first service data searching unit is used for searching service data of which the space-time index falls into the target time period and the target geographic space range from a preset database.
On the basis of the technical scheme of the embodiment of the invention, the device also comprises:
and the second business data searching module is used for determining other geographic space ranges adjacent to the target geographic space range from the multiple geographic space ranges if the business data is not searched from the preset database according to the searching conditions, and searching the business data from the preset database by taking the target time period and the other geographic space ranges as the searching conditions.
The data searching device provided by the embodiment of the invention can execute the method provided by any embodiment of the invention, and has the corresponding functional module and beneficial effect of the execution method.
It should be noted that, in the embodiment of the data search apparatus, each included unit and each included module are only divided according to functional logic, but are not limited to the above division as long as the corresponding function can be implemented; in addition, specific names of the functional units are only for convenience of distinguishing from each other, and are not used for limiting the protection scope of the present invention.
EXAMPLE five
Fig. 15 is a schematic structural diagram of an electronic device according to a fifth embodiment of the present invention. FIG. 15 illustrates a block diagram of an exemplary electronic device 12 suitable for use in implementing embodiments of the present invention. The electronic device 12 shown in fig. 15 is only an example, and should not bring any limitation to the functions and the scope of use of the embodiment of the present invention.
As shown in fig. 15, electronic device 12 is embodied in the form of a general purpose computing device. The components of electronic device 12 may include, but are not limited to: one or more processors or processing units 16, a system memory 28, and a bus 18 that couples various system components including the system memory 28 and the processing unit 16.
Bus 18 represents one or more of any of several types of bus structures, including a memory bus or memory controller, a peripheral bus, an accelerated graphics port, and a processor or local bus using any of a variety of bus architectures. By way of example, such architectures include, but are not limited to, Industry Standard Architecture (ISA) bus, micro-channel architecture (MAC) bus, enhanced ISA bus, Video Electronics Standards Association (VESA) local bus, and Peripheral Component Interconnect (PCI) bus.
Electronic device 12 typically includes a variety of computer system readable media. Such media may be any available media that is accessible by electronic device 12 and includes both volatile and nonvolatile media, removable and non-removable media.
The system memory 28 may include computer system readable media in the form of volatile memory, such as Random Access Memory (RAM)30 and/or cache memory 32. The electronic device 12 may further include other removable/non-removable, volatile/nonvolatile computer system storage media. By way of example only, storage system 34 may be used to read from and write to non-removable, nonvolatile magnetic media (not shown in FIG. 15, commonly referred to as a "hard drive"). Although not shown in FIG. 15, a magnetic disk drive for reading from and writing to a removable, nonvolatile magnetic disk (e.g., a "floppy disk") and an optical disk drive for reading from or writing to a removable, nonvolatile optical disk (e.g., a CD-ROM, DVD-ROM, or other optical media) may be provided. In these cases, each drive may be connected to bus 18 by one or more data media interfaces. System memory 28 may include at least one program product having a set (e.g., at least one) of program modules that are configured to carry out the functions of embodiments of the invention.
A program/utility 40 having a set (at least one) of program modules 42 may be stored, for example, in system memory 28, such program modules 42 including, but not limited to, an operating system, one or more application programs, other program modules, and program data, each of which examples or some combination thereof may comprise an implementation of a network environment. Program modules 42 generally carry out the functions and/or methodologies of the described embodiments of the invention.
Electronic device 12 may also communicate with one or more external devices 14 (e.g., keyboard, pointing device, display 24, etc.), with one or more devices that enable a user to interact with electronic device 12, and/or with any devices (e.g., network card, modem, etc.) that enable electronic device 12 to communicate with one or more other computing devices. Such communication may be through an input/output (I/O) interface 22. Also, the electronic device 12 may communicate with one or more networks (e.g., a Local Area Network (LAN), a Wide Area Network (WAN), and/or a public network, such as the Internet) via the network adapter 20. As shown, the network adapter 20 communicates with other modules of the electronic device 12 via the bus 18. It should be understood that although not shown in the figures, other hardware and/or software modules may be used in conjunction with electronic device 12, including but not limited to: microcode, device drivers, redundant processing units, external disk drive arrays, RAID systems, tape drives, and data backup storage systems, among others.
The processing unit 16 executes various functional applications and data processing by executing programs stored in the system memory 28, for example, to implement a data searching method provided by the embodiment, the method includes:
receiving a data query request sent by a client, and acquiring a target position point and a target time period in the data query request;
acquiring index information pre-established for the target time period; the index information comprises a plurality of geographic space ranges, and the quantity of the service data generated in each geographic space range in the target time period meets a preset condition;
and determining a target geographic space range in which the target position point is located from the plurality of geographic space ranges, and searching the service data from a preset database by taking the target time period and the target geographic space range as searching conditions.
Of course, those skilled in the art can understand that the processor can also implement the technical solution of the data searching method provided by any embodiment of the present invention.
EXAMPLE six
An embodiment of the present invention further provides a computer-readable storage medium, on which a computer program is stored, where the computer program, when executed by a processor, implements the steps of the data searching method according to any embodiment of the present invention, and the method includes:
receiving a data query request sent by a client, and acquiring a target position point and a target time period in the data query request;
acquiring index information pre-established for the target time period; the index information comprises a plurality of geographic space ranges, and the quantity of the service data generated in each geographic space range in the target time period meets a preset condition;
and determining a target geographic space range in which the target position point is located from the plurality of geographic space ranges, and searching the service data from a preset database by taking the target time period and the target geographic space range as searching conditions.
Computer storage media for embodiments of the invention may employ any combination of one or more computer-readable media. The computer readable medium may be a computer readable signal medium or a computer readable storage medium. The computer-readable storage medium may be, for example but not limited to: an electrical, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination thereof. More specific examples (a non-exhaustive list) of the computer readable storage medium would include the following: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the context of this document, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device.
A computer readable signal medium may include a propagated data signal with computer readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device.
Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: wireless, wire, fiber optic cable, RF, etc., or any suitable combination of the foregoing.
Computer program code for carrying out operations for aspects of the present invention may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, Smalltalk, C + + or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any type of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet service provider).
It will be understood by those skilled in the art that the modules or steps of the invention described above may be implemented by a general purpose computing device, they may be centralized on a single computing device or distributed across a network of computing devices, and optionally they may be implemented by program code executable by a computing device, such that it may be stored in a memory device and executed by a computing device, or it may be separately fabricated into various integrated circuit modules, or it may be fabricated by fabricating a plurality of modules or steps thereof into a single integrated circuit module. Thus, the present invention is not limited to any specific combination of hardware and software.
It is to be noted that the foregoing is only illustrative of the preferred embodiments of the present invention and the technical principles employed. It will be understood by those skilled in the art that the present invention is not limited to the particular embodiments described herein, but is capable of various obvious changes, rearrangements and substitutions as will now become apparent to those skilled in the art without departing from the scope of the invention. Therefore, although the present invention has been described in greater detail by the above embodiments, the present invention is not limited to the above embodiments, and may include other equivalent embodiments without departing from the spirit of the present invention, and the scope of the present invention is determined by the scope of the appended claims.

Claims (12)

1. A method for data retrieval, comprising:
receiving a data query request sent by a client, and acquiring a target position point and a target time period in the data query request;
acquiring index information pre-established for the target time period; the index information comprises a plurality of geographic space ranges, and the quantity of the service data generated in each geographic space range in the target time period meets a preset condition;
and determining a target geographic space range in which the target position point is located from the plurality of geographic space ranges, and searching the service data from a preset database by taking the target time period and the target geographic space range as searching conditions.
2. The method of claim 1, wherein prior to obtaining pre-established index information for the target time period, the method further comprises:
acquiring service data generated in the target time period, and generating a service data set containing all the acquired service data or part of the acquired service data;
determining that the number of generated service data belonging to the service data set does not exceed a geographic space range of a preset threshold;
and generating and storing index information containing the determined geographic space ranges.
3. The method of claim 2, wherein the partial service data comprises a preset proportion of service data sampled from all the acquired service data.
4. The method of claim 2, wherein the determining that the generated amount of the traffic data belonging to the traffic data set does not exceed a geospatial range of a preset threshold comprises:
acquiring a pre-established basic index tree; the root node of the basic index tree represents a preset maximum geographic spatial range, and other nodes represent sub-geographic spatial ranges obtained by dividing geographic spatial ranges of father nodes of other nodes;
determining the quantity of the service data falling into the geographic space range of the current leaf node in the service data set aiming at each leaf node in the basic index tree, if the quantity is larger than a preset threshold value, dividing the geographic space range of the current leaf node to obtain sub-nodes of the current leaf node until no leaf node with the quantity of the service data generated in the corresponding geographic space range larger than the preset threshold value exists;
and determining the geographic space range corresponding to each leaf node as the geographic space range in which the quantity of the generated service data belonging to the service data set does not exceed a preset threshold value.
5. The method of claim 4, wherein the base index tree is a balanced N-ary tree, wherein N is an integer no less than 2.
6. The method according to claim 4, wherein after the generating a service data set containing all or part of the acquired service data, the method further comprises:
establishing a KD tree of a spatial dimension aiming at the business data in the business data set;
correspondingly, the determining the quantity of the service data in the service data set, which falls into the geospatial range of the current leaf node, includes:
and querying the KD tree by taking the geographic spatial range of the current leaf node as a query condition, obtaining the service data of the geographic spatial range of the current leaf node in the service data set, and calculating the quantity of the service data obtained by query.
7. The method of claim 1, wherein the predetermined database is a distributed database; the service data stored in the distributed database is provided with a time-space index, and the time-space index is obtained by cross coding position information and time information of corresponding service data;
the searching the service data from the preset database by taking the target time period and the target geographic space range as searching conditions comprises the following steps:
and searching the service data of which the spatio-temporal index falls into the target time period and the target geographic space range from a preset database.
8. The method of claim 1, further comprising:
and if the business data are not searched from the preset database according to the searching conditions, determining other geographic space ranges adjacent to the target geographic space range from the multiple geographic space ranges, and searching the business data from the preset database by taking the target time period and the other geographic space ranges as the searching conditions.
9. The method according to any of claims 1-8, wherein the traffic data comprises: order data.
10. A data search apparatus, comprising:
the target information acquisition module is used for receiving a data query request sent by a client and acquiring a target position point and a target time period in the data query request;
the index information acquisition module is used for acquiring index information which is pre-established aiming at the target time period; the index information comprises a plurality of geographic space ranges, and the quantity of the service data generated in each geographic space range in the target time period meets a preset condition;
and the first business data searching module is used for determining a target geographic space range in which the target position point is located from the plurality of geographic space ranges, and searching business data from a preset database by taking the target time period and the target geographic space range as searching conditions.
11. An electronic device, characterized in that the device comprises:
one or more processors;
a memory for storing one or more programs;
when executed by the one or more processors, cause the one or more processors to implement a data lookup method as claimed in any one of claims 1-9.
12. A computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out the data lookup method of any one of claims 1-9.
CN202011598769.XA 2020-12-29 2020-12-29 Data searching method, device, equipment and storage medium Pending CN113763099A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011598769.XA CN113763099A (en) 2020-12-29 2020-12-29 Data searching method, device, equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011598769.XA CN113763099A (en) 2020-12-29 2020-12-29 Data searching method, device, equipment and storage medium

Publications (1)

Publication Number Publication Date
CN113763099A true CN113763099A (en) 2021-12-07

Family

ID=78786231

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011598769.XA Pending CN113763099A (en) 2020-12-29 2020-12-29 Data searching method, device, equipment and storage medium

Country Status (1)

Country Link
CN (1) CN113763099A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116467262A (en) * 2023-05-24 2023-07-21 和创(北京)科技股份有限公司 Metadata capability-based client liveness analysis method, device, equipment and medium

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104750708A (en) * 2013-12-27 2015-07-01 华为技术有限公司 Spatio-temporal data index building and searching methods, a spatio-temporal data index building and searching device and spatio-temporal data index building and searching equipment
WO2016025224A1 (en) * 2014-08-11 2016-02-18 Mastercard International Incorporated Methods and systems for identifying merchant and atm demand
US20170132259A1 (en) * 2015-11-09 2017-05-11 Line Corporation Method and system for detecting and using locations of electronic devices of users in a specific space to analyze social relationships between the users
CN110532437A (en) * 2019-07-18 2019-12-03 平安科技(深圳)有限公司 Electronic certificate reminding method, device, computer equipment and storage medium
CN110610267A (en) * 2019-09-10 2019-12-24 京东城市(北京)数字科技有限公司 Talent information processing method and device, computer storage medium and electronic equipment
CN110704491A (en) * 2019-09-30 2020-01-17 京东城市(北京)数字科技有限公司 Data query method and device
CN111460023A (en) * 2020-04-29 2020-07-28 上海东普信息科技有限公司 Service data processing method, device, equipment and storage medium based on elastic search

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104750708A (en) * 2013-12-27 2015-07-01 华为技术有限公司 Spatio-temporal data index building and searching methods, a spatio-temporal data index building and searching device and spatio-temporal data index building and searching equipment
WO2016025224A1 (en) * 2014-08-11 2016-02-18 Mastercard International Incorporated Methods and systems for identifying merchant and atm demand
US20170132259A1 (en) * 2015-11-09 2017-05-11 Line Corporation Method and system for detecting and using locations of electronic devices of users in a specific space to analyze social relationships between the users
CN110532437A (en) * 2019-07-18 2019-12-03 平安科技(深圳)有限公司 Electronic certificate reminding method, device, computer equipment and storage medium
CN110610267A (en) * 2019-09-10 2019-12-24 京东城市(北京)数字科技有限公司 Talent information processing method and device, computer storage medium and electronic equipment
CN110704491A (en) * 2019-09-30 2020-01-17 京东城市(北京)数字科技有限公司 Data query method and device
CN111460023A (en) * 2020-04-29 2020-07-28 上海东普信息科技有限公司 Service data processing method, device, equipment and storage medium based on elastic search

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
李晨;申德荣;朱命冬;寇月;聂铁铮;于戈;: "一种对时空信息的kNN查询处理方法", 软件学报, no. 09, 15 September 2016 (2016-09-15), pages 2278 *

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116467262A (en) * 2023-05-24 2023-07-21 和创(北京)科技股份有限公司 Metadata capability-based client liveness analysis method, device, equipment and medium

Similar Documents

Publication Publication Date Title
US11550826B2 (en) Method and system for generating a geocode trie and facilitating reverse geocode lookups
CN112269792B (en) Data query method, device, equipment and computer readable storage medium
US11468069B1 (en) Graph-based data storage for real-time content suggestions
US20130191523A1 (en) Real-time analytics for large data sets
Wang et al. A flexible spatio-temporal indexing scheme for large-scale GPS track retrieval
CN104820714A (en) Mass small tile file storage management method based on hadoop
CN103399945A (en) Data structure based on cloud computing database system
WO2022083436A1 (en) Data processing method and apparatus, and device and readable storage medium
US20220253419A1 (en) Multi-record index structure for key-value stores
CN113656397A (en) Index construction and query method and device for time series data
CN113763099A (en) Data searching method, device, equipment and storage medium
CN116431726A (en) Graph data processing method, device, equipment and computer storage medium
Bao et al. Optimizing segmented trajectory data storage with HBase for improved spatio-temporal query efficiency
CN104750860A (en) Data storage method of uncertain data
He et al. Spatial query processing for location based application on Hbase
Zhang et al. HGeoHashBase: an optimized storage model of spatial objects for location-based services
CN112115206A (en) Method and device for processing object storage metadata
CN116049521A (en) Space-time data retrieval method based on space grid coding
CN111339245B (en) Data storage method, device, storage medium and equipment
CN113821573A (en) Mass data rapid retrieval service construction method, system, terminal and storage medium
Li et al. SP-phoenix: a massive spatial point data management system based on phoenix
CN113448957A (en) Data query method and device
CN116010677B (en) Spatial index method and device and electronic equipment thereof
Mathew et al. Novel research framework on SN's NoSQL databases for efficient query processing
CN113157695B (en) Data processing method and device, readable medium and electronic equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination