CN117093598A - Data retrieval method, system, device and storage medium - Google Patents

Data retrieval method, system, device and storage medium Download PDF

Info

Publication number
CN117093598A
CN117093598A CN202311052071.1A CN202311052071A CN117093598A CN 117093598 A CN117093598 A CN 117093598A CN 202311052071 A CN202311052071 A CN 202311052071A CN 117093598 A CN117093598 A CN 117093598A
Authority
CN
China
Prior art keywords
data
table row
row key
data table
search
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202311052071.1A
Other languages
Chinese (zh)
Inventor
奉玉丽
何慧敏
帅妮
张泽
陈乐�
陈卓
张培栋
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
China Mobile Communications Group Co Ltd
China Mobile Information Technology Co Ltd
Original Assignee
China Mobile Communications Group Co Ltd
China Mobile Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by China Mobile Communications Group Co Ltd, China Mobile Information Technology Co Ltd filed Critical China Mobile Communications Group Co Ltd
Priority to CN202311052071.1A priority Critical patent/CN117093598A/en
Publication of CN117093598A publication Critical patent/CN117093598A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures
    • G06F16/2282Tablespace storage structures; Management thereof

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Software Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The application discloses a data retrieval method, a system, a device and a storage medium, which are applied to a retrieval system constructed by a search engine server and a database, wherein the data retrieval method comprises the following steps: acquiring a field to be searched and a calculation function; returning a data list row key according to the field to be searched and the calculation function by combining with a pre-established index relation; and sending the data table row key to the database, and receiving search result information returned by the database according to the data table row key. According to the application, the index relation is established in the search engine server in advance, and the fields to be searched and the calculation function which are arbitrarily composed are combined to search with the index relation, so that the requirement of multi-condition search can be met, and the intelligent degree of query is improved.

Description

Data retrieval method, system, device and storage medium
Technical Field
The present application relates to the field of computer technologies, and in particular, to a data retrieval method, system, device, and storage medium.
Background
HBase is a distributed, column-oriented, scalable open source database. In the prior art, when large data retrieval is performed based on HBase, particularly when mass data retrieval is performed, full-table scanning is performed on the retrieval process of the non-row key rowkey of the HBase, so that the conditions of low retrieval performance and low query speed occur. Therefore, in order to satisfy high-concurrency and high-efficiency large data retrieval, database table row keys rowkey order storage data are often used, and millisecond-level high-speed retrieval is realized based on the rowkeys.
In the prior art, when the big data retrieval is performed on the HBase based on the rowkey, two retrieval schemes are mainly available, wherein the first retrieval scheme is to perform the retrieval based on one rowkey, the mode supports high concurrency and high retrieval efficiency, but the defect is that the retrieval scene is simple and limited, and the retrieval of complex combined scenes is not supported; the second method is based on two-level index, and the way expands the rowkey design, such as designing rowkey=x|y, rowkey=y|z, rowkey=x|z, suitable for searching the query condition after two-by-two combination, and still supporting high concurrency, but the searching scheme still has the limitation of failing to meet the searching requirement of any combination condition.
Therefore, for the problem of data retrieval that cannot meet any combination condition in the above two methods, it is necessary to propose a data retrieval scheme for solving the problem.
Disclosure of Invention
The application mainly aims to provide a data retrieval method, a system, equipment and a storage medium, which aim to solve the problem that the prior art cannot meet any combination condition.
In order to achieve the above object, the present application provides a data retrieval method applied to a search engine server, the search engine server being connected with a database, the data retrieval method comprising:
Acquiring a field to be searched and a calculation function;
returning a data list row key according to the field to be searched and the calculation function by combining with a pre-established index relation;
and sending the data table row key to the database, and receiving search result information returned by the database according to the data table row key.
Optionally, the step of returning the data table row key according to the field to be retrieved and the calculation function and in combination with a pre-established index relation includes:
matching with the index relation according to the field to be searched and the calculation function;
judging whether the calculation function is atomic calculation or not under the condition that the index relation is not matched;
if the calculation function is atomic calculation, returning prompt information which does not accord with the operation rule;
if the calculation function is not atomic calculation, decomposing the calculation function into atomic calculation, and returning the data table row key according to the field to be searched and the decomposed atomic calculation and the index relation.
Optionally, the data retrieval method further includes:
and under the condition of returning the data table row key, generating and returning the relation among the field to be searched, the calculation function and the data table row key based on the data table row key.
Optionally, after the step of matching with the index relationship according to the field to be retrieved and the calculation function, the method further includes:
and under the condition that the index relation is matched, returning the corresponding data table row key according to the matched index relation.
Optionally, before the step of returning the data table row key according to the field to be retrieved and the calculation function and in combination with the pre-established index relationship, the method further includes:
establishing the index relation, wherein the index relation comprises an index relation between a search field and a data table row key set and an index relation between a search field, a calculation function and the data table row key set; the data table row key set consists of a plurality of data table row keys.
Optionally, the step of establishing the index relationship includes:
importing a plurality of data fields and data table row key sets which are collected in advance, and creating and obtaining an index relation between the retrieval fields and the data table row key sets based on the association of the data fields and the data table row key sets in a database;
and encapsulating the computing function, and creating and obtaining the index relation among the search field, the computing function and the data table row key set based on the association of the encapsulated computing function, the data field and the data table row key set in the database.
Optionally, before the step of obtaining the field to be retrieved and the computing function, the method further includes:
creating a search query page, wherein the search query page is used for a user to input the field to be retrieved and the computing function;
creating a representational state transfer Rest interface, wherein the Rest interface is used for connecting the search query page and the search engine server, and the Rest interface is also used for agreeing with a method adopted by an http request and a requested link.
Optionally, the search engine server includes a cold data area and a hot data area, and after the step of sending the data table row key to the database and receiving the search result information returned by the database according to the data table row key, the method further includes:
generating a retrieval record according to the index relation used for retrieval;
and updating the cold data area and/or the hot data area according to the retrieval record, wherein a storage node of the cold data area is used for storing an index relation lower than a preset retrieval frequency, and a storage node of the hot data area is used for storing an index relation higher than the preset retrieval frequency.
Optionally, the step of updating the cold data area and/or the hot data area according to the retrieval record comprises:
Judging whether the search frequency of the index relation in the search record is higher than the preset search frequency;
if the index relation searching frequency is higher than the preset searching frequency, writing the index relation into the hot data area; and/or the number of the groups of groups,
and if the search frequency of the index relation is lower than the preset search frequency, migrating the index relation to the cold data area.
Optionally, before the step of receiving the search result information returned by the database according to the data table row key, the method further includes:
receiving the data table row key sent by the search engine server through the database;
and retrieving the retrieval result information according to the data table row key, and returning the retrieval result information to the search engine server.
Optionally, before the step of receiving the data table row key sent by the search engine server, the method further includes:
importing an original data file into the database;
extracting a data field based on the original data file;
pre-computing the data field based on a computing function to obtain a field computing result;
establishing a data table row key set based on the field operation result;
And sending the data fields, the computing functions and the data table row key sets to the search engine server so that the search engine server can create and obtain index relations among the search fields, the computing functions and the data table row key sets.
The embodiment of the application also provides a data retrieval system, which comprises: a search engine server and a database; the search engine server comprises an acquisition module, an index module and a result module;
the acquisition module is used for acquiring the field to be searched and the calculation function;
the index module is used for returning a data table row key according to the field to be searched and the calculation function and combining a pre-established index relation;
the result module is used for sending the data table row key to the database and receiving search result information returned by the database according to the data table row key;
the database is used for receiving the data table row key sent by the search engine server, retrieving the retrieval result information according to the data table row key, and returning the retrieval result information to the search engine server.
The embodiment of the application also provides a data retrieval device which comprises a memory, a processor and a data retrieval program stored on the memory and capable of running on the processor, wherein the data retrieval program realizes the steps of the data retrieval method when being executed by the processor.
The embodiment of the application also provides a computer readable storage medium, wherein the computer readable storage medium stores a data retrieval program, and the data retrieval program realizes the steps of the data retrieval method when being executed by a processor.
The method, the system, the equipment and the storage medium for retrieving the data provided by the embodiment of the application acquire the field to be retrieved and the calculation function through a search engine server; returning a data list row key according to the field to be searched and the calculation function by combining with a pre-established index relation; and sending the data table row key to the database, and receiving search result information returned by the database according to the data table row key. Based on the scheme of the application, any combination of the field to be searched and the calculation function is supported, and the method has enough flexibility; by pre-establishing an index relation in a search engine server and searching by combining the index relation with the search condition with any composition, the requirement of multi-condition searching can be met, and the intelligent degree of inquiry is improved; meanwhile, the database retrieval is carried out on the data table row keys returned based on the index relation, full table scanning is not needed, the query time is shortened, and the query efficiency is improved.
Compared with the prior art, the data retrieval system combined with the search engine server and the database and constructed based on the scheme of the application can realize second-level response performance of mass data query, and has the technical advantages of flexibility of a system framework, support of large-scale user data operation, high reading speed and strong expandability.
Drawings
FIG. 1 is a schematic diagram of functional modules of a device to which a data retrieval apparatus of the present application belongs;
FIG. 2 is a flow chart of a first exemplary embodiment of a data retrieval method according to the present application;
FIG. 3 is a flow chart of a second exemplary embodiment of a data retrieval method of the present application;
FIG. 4 is a flow chart of a third exemplary embodiment of a data retrieval method according to the present application;
FIG. 5 is a flowchart of a fifth exemplary embodiment of a data retrieval method according to the present application;
FIG. 6 is a flowchart of a sixth exemplary embodiment of a data retrieval method according to the present application;
FIG. 7 is a flow chart of a seventh exemplary embodiment of a data retrieval method of the present application;
fig. 8 is a schematic diagram of an execution effect according to an exemplary embodiment of the data retrieval method of the present application.
The achievement of the objects, functional features and advantages of the present application will be further described with reference to the accompanying drawings, in conjunction with the embodiments.
Detailed Description
It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the application.
The main solutions of the embodiments of the present application are: obtaining a field to be searched and a calculation function; returning a data list row key according to the field to be searched and the calculation function by combining with a pre-established index relation; and sending the data table row key to the database, and receiving search result information returned by the database according to the data table row key. Based on the scheme of the application, any combination of the field to be searched and the calculation function is supported, and the method has enough flexibility; by pre-establishing an index relation in a search engine server and searching by combining the index relation with the search condition with any composition, the requirement of multi-condition searching can be met, and the intelligent degree of inquiry is improved; meanwhile, the database retrieval is carried out on the data table row keys returned based on the index relation, full table scanning is not needed, the query time is shortened, and the query efficiency is improved.
Compared with the prior art, the data retrieval system combined with the search engine server and the database and constructed based on the scheme of the application can realize second-level response performance of mass data query, and has the technical advantages of flexibility of a system framework, support of large-scale user data operation, high reading speed and strong expandability.
Specifically, referring to fig. 1, fig. 1 is a schematic functional block diagram of a device to which the data retrieving apparatus of the present application belongs. The data retrieval means may be a device capable of data retrieval independent of the device to which it belongs, which may be carried on the device to which it belongs in the form of hardware or software.
In this embodiment, the apparatus to which the data retrieving apparatus belongs includes at least an output module 110, a processor 120, a memory 130, and a communication module 140.
The memory 130 stores an operating system and a data retrieval program, and the data retrieval device may store the acquired fields to be retrieved and the calculation functions, the pre-established index relationship, the data table row keys returned by combining the fields to be retrieved and the calculation functions with the pre-established index relationship, and the information such as the retrieval result information retrieved according to the data table row keys in the memory 130; the output module 110 may be a display screen or the like. The communication module 140 may include a satellite communication module, a WIFI module, a mobile communication module, a bluetooth module, and the like, and communicates with an external device or a server through the communication module 140.
Wherein the data retrieval program in the memory 130 when executed by the processor performs the steps of:
Acquiring a field to be searched and a calculation function;
returning a data list row key according to the field to be searched and the calculation function by combining with a pre-established index relation;
and sending the data table row key to the database, and receiving search result information returned by the database according to the data table row key.
Further, the data retrieval program in the memory 130 when executed by the processor also implements the steps of:
matching with the index relation according to the field to be searched and the calculation function;
judging whether the calculation function is atomic calculation or not under the condition that the index relation is not matched;
if the calculation function is atomic calculation, returning prompt information which does not accord with the operation rule;
if the calculation function is not atomic calculation, decomposing the calculation function into atomic calculation, and returning the data table row key according to the field to be searched and the decomposed atomic calculation and the index relation.
Further, the data retrieval program in the memory 130 when executed by the processor also implements the steps of:
and under the condition of returning the data table row key, generating and returning the relation among the field to be searched, the calculation function and the data table row key based on the data table row key.
Further, the data retrieval program in the memory 130 when executed by the processor also implements the steps of:
and under the condition that the index relation is matched, returning the corresponding data table row key according to the matched index relation.
Further, the data retrieval program in the memory 130 when executed by the processor also implements the steps of:
establishing the index relation, wherein the index relation comprises an index relation between a search field and a data table row key set and an index relation between a search field, a calculation function and the data table row key set; the data table row key set consists of a plurality of data table row keys.
Further, the data retrieval program in the memory 130 when executed by the processor also implements the steps of:
importing a plurality of data fields and data table row key sets which are collected in advance, and creating and obtaining an index relation between the retrieval fields and the data table row key sets based on the association of the data fields and the data table row key sets in a database;
and encapsulating the computing function, and creating and obtaining the index relation among the search field, the computing function and the data table row key set based on the association of the encapsulated computing function, the data field and the data table row key set in the database.
Further, the data retrieval program in the memory 130 when executed by the processor also implements the steps of:
creating a search query page, wherein the search query page is used for a user to input the field to be retrieved and the computing function;
creating a representational state transfer Rest interface, wherein the Rest interface is used for connecting the search query page and the search engine server, and the Rest interface is also used for agreeing with a method adopted by an http request and a requested link.
Further, the data retrieval program in the memory 130 when executed by the processor also implements the steps of:
generating a retrieval record according to the index relation used for retrieval;
and updating the cold data area and/or the hot data area according to the retrieval record, wherein a storage node of the cold data area is used for storing an index relation lower than a preset retrieval frequency, and a storage node of the hot data area is used for storing an index relation higher than the preset retrieval frequency.
Further, the data retrieval program in the memory 130 when executed by the processor also implements the steps of:
judging whether the search frequency of the index relation in the search record is higher than the preset search frequency;
If the index relation searching frequency is higher than the preset searching frequency, writing the index relation into the hot data area; and/or the number of the groups of groups,
and if the search frequency of the index relation is lower than the preset search frequency, migrating the index relation to the cold data area.
Further, the data retrieval program in the memory 130 when executed by the processor also implements the steps of:
receiving the data table row key sent by the search engine server through the database;
and retrieving the retrieval result information according to the data table row key, and returning the retrieval result information to the search engine server.
Further, the data retrieval program in the memory 130 when executed by the processor also implements the steps of:
importing an original data file into the database;
extracting a data field based on the original data file;
pre-computing the data field based on a computing function to obtain a field computing result;
establishing a data table row key set based on the field operation result;
and sending the data fields, the computing functions and the data table row key sets to the search engine server so that the search engine server can create and obtain index relations among the search fields, the computing functions and the data table row key sets.
According to the scheme, the field to be searched and the calculation function are obtained; returning a data list row key according to the field to be searched and the calculation function by combining with a pre-established index relation; and sending the data table row key to the database, and receiving search result information returned by the database according to the data table row key. Based on the scheme of the application, any combination of the field to be searched and the calculation function is supported, and the method has enough flexibility; by pre-establishing an index relation in a search engine server and searching by combining the index relation with the search condition with any composition, the requirement of multi-condition searching can be met, and the intelligent degree of inquiry is improved; meanwhile, the database retrieval is carried out on the data table row keys returned based on the index relation, full table scanning is not needed, the query time is shortened, and the query efficiency is improved.
The data retrieval system architecture related to the embodiment of the application at least comprises a search engine server and a database, wherein the search engine server is connected with the database.
The search engine server is mainly used for establishing and storing index relations between the search fields and the rowkey sets of the data table row keys and index relations among the search fields, the calculation functions and the rowkey sets; and the data list row key is also used for matching the information required to be searched by the user, such as a search field and a calculation function, with the pre-established index relation and returning the corresponding data list row key. The elements in the set of data table row keys rowkey are data table row keys rowkey. In the embodiment of the application, the search engine server selects a SolrCloud cluster consisting of a plurality of Solr search application servers.
The database is mainly used for storing the bottom data, such as database tables in a standard format. Wherein a data row of the database table includes a data table row key rowkey. The corresponding data line content can be obtained by querying the data table line key rowkey. In the embodiment of the application, the database is also used for pre-operating the data fields according to the set calculation rules and forming field operation results. The database in the embodiment of the application adopts an open source database HBase.
Based on the system architecture, but not limited to the architecture, the method embodiments of the present application are presented.
Referring to fig. 2, fig. 2 is a flowchart illustrating a first exemplary embodiment of a data retrieval method according to the present application. The data retrieval method of the embodiment is applied to a search engine server, wherein the search engine server is connected with a database, and the data retrieval method comprises the following steps:
step S10, obtaining a field to be retrieved and a calculation function.
Specifically, when the user needs to retrieve data, a field to be retrieved and a calculation function are acquired. The obtaining manner of the field to be searched and the computing function may be a manner of reading a pre-stored field to be searched and the computing function, or may be a manner of obtaining the field to be searched and the computing function input by the user through the provided search query page, or may be a manner of receiving the field to be searched and the computing function sent by the third party device or the server, and the like. In this embodiment, the calculation functions may include, but are not limited to, avg, sum, count, and the like.
By arbitrarily combining a plurality of fields to be searched and the calculation function, rich and complex search conditions can be obtained.
Step S20, returning a data list row key according to the field to be searched and the calculation function and combining with a pre-established index relation.
Specifically, an index relationship is pre-established and stored in the search engine server, where the index relationship may include an index relationship between a search field and a rowkey set, and may further include an index relationship between a search field, a calculation function, and a rowkey set. The elements in the rowkey set are the data table row keys rowkeys.
A data table row key rowkey is included in a data row of the database table. By associating the index relationship supporting the multi-condition search request with the data row, the association of the index relationship supporting the multi-condition search request with the data table row key in the data row can be completed.
In this embodiment, according to the obtained field to be retrieved and the calculation function, the data table row key is returned in combination with the established index relationship. More specifically, the obtained field to be searched and the calculation function are matched with the established index relationship, such as the index relationship between the search field and the rowkey set or the index relationship between the search field and the calculation function and the rowkey set, and the data table row key rowkey in the index relationship successfully matched is returned.
Step S30, the data table row key is sent to the database, and the search result information returned by the database is received.
Thereafter, the data table row key rowkey is provided to the database, and the search result information is returned to the search engine server.
Specifically, the data table row key rowkey returned in the step is sent to a database connected with a search engine server, so that the database can search according to the data table row key rowkey to obtain search result information. And searching the database according to the data table row key rowkey to obtain search result information, and returning the search result information to the search engine server. And then, the search engine server receives the search result information returned by the database.
According to the scheme, the field to be searched and the calculation function are obtained; returning a data list row key according to the field to be searched and the calculation function by combining with a pre-established index relation; and sending the data table row key to the database, and receiving search result information returned by the database according to the data table row key. Based on the scheme of the application, any combination of the field to be searched and the calculation function is supported, and the method has enough flexibility; by pre-establishing an index relation in a search engine server and searching by combining the index relation with the search condition with any composition, the requirement of multi-condition searching can be met, and the intelligent degree of inquiry is improved; meanwhile, the database retrieval is carried out on the data table row keys returned based on the index relation, full table scanning is not needed, the query time is shortened, and the query efficiency is improved.
Further, referring to fig. 3, fig. 3 is a flowchart of a second exemplary embodiment of the data retrieval method according to the present application. Based on the embodiment shown in fig. 2, in this embodiment, the step S20, according to the field to be retrieved and the calculation function, of returning the data table row key in combination with the pre-established index relationship may include:
step S201, matching with the index relationship according to the field to be retrieved and the calculation function.
Specifically, for the obtained field to be searched and the calculation function, matching the field to be searched and the calculation function with the established index relationship, for example, matching the obtained field to be searched and the calculation function with the index relationship between the established search field and the rowkey set; or matching the acquired fields to be searched and the calculation function with the established index relation among the search fields, the calculation function and the rowkey set.
Step S202, judging whether the calculation function is atomic calculation or not under the condition that the index relation is not matched.
Specifically, if the field to be searched and the calculation function are not matched with the index relation, judging whether the calculation function is atomic calculation or not. Among them, atomic calculations may include, but are not limited to count, max, min, sum, avg, etc.
Step 203, if the calculation function is atomic calculation, a prompt message not conforming to the operation rule is returned.
And step S204, if the calculation function is not atomic calculation, decomposing the calculation function into atomic calculation, and returning the data table row key according to the field to be searched and the decomposed atomic calculation and the index relation.
Specifically, if the computing function used for matching is atomic computing, prompt information which does not accord with the operation rule is generated and returned.
If the computation function for matching is not atomic, then the computation function is decomposed into atomic computations. And returning a data table row key rowkey according to the acquired field to be searched and the resolved atom calculation and combining with a pre-established index relation. More specifically, the obtained field to be searched and the resolved atomic calculation number are matched with the established index relationship, for example, the index relationship between the established search field and the rowkey set or the index relationship between the established search field, the calculation function and the rowkey set is matched, and the data table row key rowkey in the successfully matched index relationship is returned.
Further, in this embodiment, the data retrieval method may further include:
Step S205, in the case of returning to the data table row key, generating and returning to the relation among the field to be retrieved, the calculation function and the data table row key based on the data table row key.
Specifically, after the data table row key rowkey is returned according to the field to be searched and the atom calculation and combined with the established index relation, the relation among the field to be searched, the calculation function and the rowkey is generated according to the returned rowkey and returned. So that after the queried search result information is returned from the database according to the rowkey, the original search data such as the field to be searched, the calculation function and the like can be calculated according to the generated relation among the field to be searched, the calculation function and the rowkey and the search result information.
Further, in this embodiment, in step S201, after matching with the index relationship according to the field to be retrieved and the calculation function, the method may further include:
step S206, when the index relation is matched, returning the corresponding data table row key according to the matched index relation.
Specifically, when matching the field to be searched and the computing function with the established index relationship, for example, when matching the field to be searched and the computing function with the index relationship between the search field and the rowkey set or the index relationship between the search field and the computing function and the rowkey set, if the index relationship is matched, the corresponding data table row key rowkey is obtained according to the matched index relationship and returned.
Further, referring to fig. 4, fig. 4 is a flowchart of a third exemplary embodiment of the data retrieval method according to the present application. Based on the embodiment shown in fig. 2 or fig. 3, in this embodiment, before returning the data table row key in accordance with the field to be retrieved and the calculation function and in conjunction with the pre-established index relationship in step S20, the method may further include:
step S11, establishing the index relation, wherein the index relation comprises the index relation between a search field and a data table row key set and the index relation between a search field, a calculation function and the data table row key set; the data table row key set consists of a plurality of data table row keys.
In this embodiment, step S11 is performed between step S10 and step S20, and in other embodiments, step S11 may be performed before step S10.
Compared to the embodiments shown in fig. 2 or fig. 3, this embodiment further includes a scheme for establishing the index relationship.
Specifically, the step of establishing the index relationship may include:
importing a plurality of data fields and data table row key sets which are collected in advance, and creating and obtaining an index relation between the retrieval fields and the data table row key sets based on the association of the data fields and the data table row key sets in a database;
And encapsulating the computing function, and creating and obtaining the index relation among the search field, the computing function and the data table row key set based on the association of the encapsulated computing function, the data field and the data table row key set in the database.
More specifically, the data fields and the data table row key sets of a plurality of databases are collected in advance. The collected data fields and the set of data table row keys are imported into a search engine server. The imported data field is used for matching with the input field to be searched to obtain a rowkey corresponding to the database table row to be searched. And then, creating and obtaining an index relation between the retrieval field and the rowkey set according to the association of the imported data field and the rowkey set rowkey in the database, namely according to the corresponding relation of the data field and the rowkey set in the database.
The computing functions, such as avg, sum, count, etc., adapted to the particular query are encapsulated. And creating and obtaining index relations among the search field, the calculation function and the rowkey set according to the association of the packaged calculation function, the pre-collected data field and the rowkey of the data table row key set in the database, namely according to the corresponding relation among the calculation function, the data field and the rowkey of the data table row key in the database.
And then, storing the index relation between the created search field and the rowkey set and the index relation among the search field, the calculation function and the rowkey set into a search engine server/mapping the search field, the calculation function and the rowkey set into a corresponding core processing unit, and developing a data search interface and a query statement which are suitable for foreground query requirements.
In an embodiment, the association between the imported data field and the rowkey set rowkey in the database, and the association between the calculation function, the data field and the rowkey set rowkey in the database may be known through a pre-operation process of the data field. That is, according to the set calculation rule, the data field is operated in advance in the database, so as to obtain the field operation result and generate the corresponding data table row key rowkey. And obtaining the association relation among the data fields, the calculation rules and the data table row keys corresponding to the field operation results according to the data fields, the calculation rules and the data table row keys related to the field operation results.
By pre-computing the data fields in advance in the database, more complex index conditions are provided for the query side, so that the input fields to be searched and the calculation function are not required to be computed and then the search result is returned when each search is performed, and the efficient return of the search result is achieved.
Further, based on the above embodiment, in this embodiment, before the field to be retrieved and the calculation function are acquired in step S10, the method may further include:
creating a search query page, wherein the search query page is used for a user to input the retrieval field and the computing function;
creating a representational state transfer Rest interface, wherein the Rest interface is used for connecting the search query page and the search engine server, and the Rest interface is also used for agreeing with a method adopted by an http request and a requested link.
Specifically, a search query page is created for providing user input of search fields and computing functions to be retrieved. A representational state transfer interface Rest interface is created. Connecting the search query page and the search engine server through the created Rest interface, and appointing the method adopted by the http request and the requested link through the created Rest interface.
Further, in this embodiment, the search engine server may include a cold data area and a hot data area. Providing low hardware configuration for a storage Node of the cold data area, wherein the low hardware configuration is used for storing data with lower retrieval frequency; the storage Node for the hot data area is provided with a high hardware configuration for storing data with a higher retrieval frequency.
Referring to fig. 5, fig. 5 is a flowchart illustrating a fifth exemplary embodiment of a data retrieval method according to the present application. In the step S30, after sending the data table row key to the database and receiving the search result information returned by the database according to the data table row key, the method may further include:
step S40, a search record is generated according to the index relation used for search.
Specifically, after receiving the retrieval result information returned from the database, a retrieval record is generated according to the index relationship used in the retrieval process to record the retrieval frequency of the index relationship.
And step S50, updating the cold data area and/or the hot data area according to the retrieval record, wherein a storage node of the cold data area is used for storing index relations lower than a preset retrieval frequency, and a storage node of the hot data area is used for storing index relations higher than the preset retrieval frequency.
Specifically, the cold data area and/or the hot data area in the search engine server are updated according to the generated search record. More specifically, a search frequency threshold, that is, a preset search frequency is preset. The preset search frequency is used for comparing with the search frequency of the index relation in the search record. And updating the data stored in the cold data area and/or the hot data area in the search engine server according to the retrieval frequency of the index relation in the retrieval record and the preset retrieval frequency. The storage nodes of the cold data area are used for storing index relations with search frequency lower than preset search frequency, and the storage nodes of the hot data area are used for storing index relations with search frequency higher than the preset search frequency.
Further, the step of updating the cold data area and/or the hot data area according to the retrieval record may comprise:
judging whether the search frequency of the index relation in the search record is higher than the preset search frequency;
if the index relation searching frequency is higher than the preset searching frequency, writing the index relation into the hot data area; and/or the number of the groups of groups,
and if the search frequency of the index relation is lower than the preset search frequency, migrating the index relation to the cold data area.
Specifically, judging whether the search frequency of the index relation in the generated search record is higher than a preset search frequency; if the search frequency of the index relation is higher than the preset search frequency, writing the index relation into a hot data area in a search engine server; and/or the number of the groups of groups,
if the index relation searching frequency is lower than the preset searching frequency, the index relation is migrated to a cold data area in the search engine server. It should be noted that, the index relationship of the cold data area is not written, but the index relationship with the search frequency lower than the preset search frequency is periodically migrated.
Illustratively, for a daily generated search record, the index condition in the search record in the last day is written to the hot data area, and a high hardware configuration is provided for the hot data area Node. And for index conditions not retrieved in the last day, automatically migrating to the cold data area periodically, and providing low hardware configuration for the cold data area Node. The index condition of the cold data area is not written, but the index condition which is not searched in the last day is automatically migrated at regular intervals.
In the scheme of the embodiment, a cold data area and a hot data area are divided in a search engine server, an index condition of high-frequency retrieval is defined as hot data, and the hot data area is written; the index condition of low frequency search is defined as cold data and is migrated to a hot data area, so that high-flux cold and hot data exchange is realized, periodic updating of the cold and hot data is realized through timing scheduling, a high-allocation machine is provided for a hot data Node, and a low-allocation machine is provided for the cold data Node, so that the data search efficiency can be further improved under the condition of limited resources.
Referring to fig. 6, fig. 6 is a flowchart illustrating a sixth exemplary embodiment of a data retrieval method according to the present application. Based on any one of the foregoing embodiments, before the step of receiving the search result information returned by the database according to the data table row key, the data search method may further include:
step S301, receiving, by the database, the data table row key sent by the search engine server.
Specifically, when a user searches data, the data table row key rowkey transmitted by the search engine server is received through a database connected with the search engine server.
Step S302, retrieving the retrieval result information according to the data table row key, and returning the retrieval result information to the search engine server.
Specifically, searching is performed according to the received data table row key rowkey, search result information is obtained, and the search result information is returned to the connected search engine server.
According to the scheme, the data table row key sent by the search engine server is received through the database; and retrieving the retrieval result information according to the data table row key, and returning the retrieval result information to the search engine server. According to the embodiment of the application, the database is searched based on the data table row keys returned by the index relation, the whole table scanning is not needed, the query time is reduced, and the query efficiency is improved.
Further, referring to fig. 7, fig. 7 is a flowchart of a seventh exemplary embodiment of the data retrieving method according to the present application. Based on the embodiment shown in fig. 6, in this embodiment, before the step S70 of receiving, by the database, the data table row key sent by the search engine server, the method may further include:
step S01, importing an original data file into the database;
and step S02, extracting data fields based on the original data file.
Specifically, the original data file is imported into a database, and the data field is extracted from the imported original data file.
Optionally, the format of the original data file is normalized to form a database table before the original data file is imported into the database. And then importing the database table in the standard format into a database.
And step S03, pre-operating the data field based on a calculation function to obtain a field operation result.
Specifically, the data fields in the database are pre-operated by combining with a calculation function, for example, the data fields conforming to the corresponding calculation rule are pre-operated by combining with the calculation functions such as avg, sum, count, max, min, and the like, so as to obtain a field operation result.
And step S04, establishing a data table row key set based on the field operation result.
Specifically, a corresponding data table row key rowkey set is established according to a field operation result obtained after the pre-operation of the calculation function. For example, for the data fields 001, 002, 003, a field operation result 003 is obtained by performing a pre-operation in combination with the calculation function max, and a corresponding data table row key rowkey set is established according to the field operation result 003.
Step S05, the data field, the computing function and the data table row key set are sent to the search engine server, so that the search engine server can create and obtain index relations among the search field, the computing function and the data table row key set.
Specifically, the data fields, the computing functions and the established rowkey sets of the data table rows used in the pre-operation process are sent to a search engine server, so that the search engine server creates and obtains index relations among the search fields, the computing functions and the rowkey sets according to the received data fields, the computing functions and the rowkey sets.
Illustratively, after performing calculation function pre-operation on data fields in the HBase database, submitting an XML file in a certain format to a Solr server through an http request, and establishing index relations among the related data fields, calculation functions and rowkey sets in the Solr, wherein the index relations can support multi-condition query.
Further, in this embodiment, the step of retrieving the search result information according to the data table row key may include: and inquiring according to the row keys of the data table and rowkey column index information of the database to obtain the retrieval result information.
Specifically, extracting corresponding data row records according to the received data table row keys; and according to the data line record and rowkey column index information of the database, positioning and inquiring to obtain a corresponding data list record serving as search result information. And then, returning the queried search result information to the corresponding search engine server.
Illustratively, the effect of data retrieval based on the HBase database+solrcoud cluster built in this embodiment of the method is as follows:
the number of basic data of search query is 20 hundred million, and the deployment environment is shown in the following table one:
table one: deployment environment
As shown in fig. 8, three, five and eight word screening conditions are set in 20 hundred million pieces of basic data, and when fields are 10, 30, 50 and full-quantity fields are displayed, the response time of each screening condition is 4.4 seconds, 0.3 seconds and 0.2 seconds at maximum.
According to the scheme, the data fields and the calculation functions are pre-operated in advance in the database, so that more complex index conditions are provided for the query side, the input fields to be searched and the input calculation functions are not required to be operated and then search results are returned when each search is performed, and efficient return of the search results is achieved.
In addition, compared with the prior art, the data retrieval system combined with the search engine server and the database constructed based on the scheme of the application can realize second-level response performance of mass data query and has the technical advantages of flexibility of a system frame, support of large-batch user data operation, high reading speed and strong expandability.
In addition, the embodiment of the application also provides a data retrieval system, which comprises: a search engine server and a database; the search engine server comprises an acquisition module, an index module and a result module;
the acquisition module is used for acquiring the field to be searched and the calculation function;
the index module is used for returning a data table row key according to the field to be searched and the calculation function and combining a pre-established index relation;
the result module is used for sending the data table row key to the database and receiving search result information returned by the database according to the data table row key;
the database is used for receiving the data table row key sent by the search engine server, retrieving the retrieval result information according to the data table row key, and returning the retrieval result information to the search engine server.
The principle and implementation process of data retrieval are implemented in this embodiment, please refer to the above embodiments, and are not described here again.
In addition, the embodiment of the application also provides a data retrieval device, which comprises a memory, a processor and a data retrieval program stored on the memory and capable of running on the processor, wherein the data retrieval program realizes the steps of the data retrieval method when being executed by the processor.
Because the data retrieval program is executed by the processor and adopts all the technical schemes of all the embodiments, the data retrieval program has at least all the beneficial effects brought by all the technical schemes of all the embodiments and is not described in detail herein.
Furthermore, the embodiment of the present application also proposes a computer-readable storage medium having stored thereon a data retrieval program which, when executed by a processor, implements the steps of the data retrieval method as described above.
Because the data retrieval program is executed by the processor and adopts all the technical schemes of all the embodiments, the data retrieval program has at least all the beneficial effects brought by all the technical schemes of all the embodiments and is not described in detail herein.
Compared with the prior art, the data retrieval method, the system, the equipment and the storage medium provided by the embodiment of the application are characterized in that the fields to be retrieved and the calculation functions are obtained; returning a data list row key according to the field to be searched and the calculation function by combining with a pre-established index relation; and sending the data table row key to the database, and receiving search result information returned by the database according to the data table row key. Based on the scheme of the application, any combination of the field to be searched and the calculation function is supported, and the method has enough flexibility; by pre-establishing an index relation in a search engine server and searching by combining the index relation with the search condition with any composition, the requirement of multi-condition searching can be met, and the intelligent degree of inquiry is improved; meanwhile, the database retrieval is carried out on the data table row keys returned based on the index relation, full table scanning is not needed, the query time is shortened, and the query efficiency is improved.
Compared with the prior art, the data retrieval system combined with the search engine server and the database and constructed based on the scheme of the application can realize second-level response performance of mass data query, and has the technical advantages of flexibility of a system framework, support of large-scale user data operation, high reading speed and strong expandability.
It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or system that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or system. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article, or system that comprises the element.
The foregoing embodiment numbers of the present application are merely for the purpose of description, and do not represent the advantages or disadvantages of the embodiments.
From the above description of the embodiments, it will be clear to those skilled in the art that the above-described embodiment method may be implemented by means of software plus a necessary general hardware platform, but of course may also be implemented by means of hardware, but in many cases the former is a preferred embodiment. Based on this understanding, the technical solution of the present application may be embodied essentially or in a part contributing to the prior art in the form of a software product stored in a storage medium (e.g. ROM/RAM, magnetic disk, optical disk) as above, comprising several instructions for causing an apparatus to perform the method of each embodiment of the present application.
The foregoing description is only of the preferred embodiments of the present application, and is not intended to limit the scope of the application, but rather is intended to cover any equivalents of the structures or equivalent processes disclosed herein or in the alternative, which may be employed directly or indirectly in other related arts.

Claims (14)

1. A data retrieval method, applied to a search engine server, the search engine server being connected to a database, the data retrieval method comprising:
acquiring a field to be searched and a calculation function;
returning a data list row key according to the field to be searched and the calculation function by combining with a pre-established index relation;
and sending the data table row key to the database, and receiving search result information returned by the database according to the data table row key.
2. The data retrieval method according to claim 1, wherein the step of returning a data table row key in combination with a pre-established index relationship according to the field to be retrieved and the calculation function includes:
matching with the index relation according to the field to be searched and the calculation function;
judging whether the calculation function is atomic calculation or not under the condition that the index relation is not matched;
If the calculation function is atomic calculation, returning prompt information which does not accord with the operation rule;
if the calculation function is not atomic calculation, decomposing the calculation function into atomic calculation, and returning the data table row key according to the field to be searched and the decomposed atomic calculation and the index relation.
3. The data retrieval method according to claim 2, characterized in that the data retrieval method further comprises:
and under the condition of returning the data table row key, generating and returning the relation among the field to be searched, the calculation function and the data table row key based on the data table row key.
4. The data retrieval method according to claim 2, wherein after the step of matching the index relation according to the field to be retrieved and the calculation function, further comprising:
and under the condition that the index relation is matched, returning the corresponding data table row key according to the matched index relation.
5. The data retrieval method according to claim 1, wherein before the step of returning the data table row key in combination with the pre-established index relation according to the field to be retrieved and the calculation function, further comprising:
Establishing the index relation, wherein the index relation comprises an index relation between a search field and a data table row key set and an index relation between a search field, a calculation function and the data table row key set; the data table row key set consists of a plurality of data table row keys.
6. The data retrieval method as recited in claim 5, wherein the step of establishing the index relationship comprises:
importing a plurality of data fields and data table row key sets which are collected in advance, and creating and obtaining an index relation between the retrieval fields and the data table row key sets based on the association of the data fields and the data table row key sets in a database;
and encapsulating the computing function, and creating and obtaining the index relation among the search field, the computing function and the data table row key set based on the association of the encapsulated computing function, the data field and the data table row key set in the database.
7. The data retrieval method according to claim 1, wherein before the step of obtaining the field to be retrieved and the calculation function, further comprising:
creating a search query page, wherein the search query page is used for a user to input the field to be retrieved and the computing function;
Creating a representational state transfer Rest interface, wherein the Rest interface is used for connecting the search query page and the search engine server, and the Rest interface is also used for agreeing with a method adopted by an http request and a requested link.
8. The data retrieval method according to claim 1, wherein the search engine server includes a cold data area and a hot data area, and after the step of transmitting the data table row key to the database and receiving the retrieval result information returned by the database according to the data table row key, further includes:
generating a retrieval record according to the index relation used for retrieval;
and updating the cold data area and/or the hot data area according to the retrieval record, wherein a storage node of the cold data area is used for storing an index relation lower than a preset retrieval frequency, and a storage node of the hot data area is used for storing an index relation higher than the preset retrieval frequency.
9. The data retrieval method according to claim 8, wherein the step of updating the cold data area and/or the hot data area according to the retrieval record comprises:
judging whether the search frequency of the index relation in the search record is higher than the preset search frequency;
If the index relation searching frequency is higher than the preset searching frequency, writing the index relation into the hot data area; and/or the number of the groups of groups,
and if the search frequency of the index relation is lower than the preset search frequency, migrating the index relation to the cold data area.
10. The data retrieval method according to claim 1, further comprising, prior to the step of receiving retrieval result information returned by the database according to the data table row key:
receiving the data table row key sent by the search engine server through the database;
and retrieving the retrieval result information according to the data table row key, and returning the retrieval result information to the search engine server.
11. The data retrieval method of claim 10, further comprising, prior to the step of receiving the data table row key sent by the search engine server:
importing an original data file into the database;
extracting a data field based on the original data file;
pre-computing the data field based on a computing function to obtain a field computing result;
establishing a data table row key set based on the field operation result;
And sending the data fields, the computing functions and the data table row key sets to the search engine server so that the search engine server can create and obtain index relations among the search fields, the computing functions and the data table row key sets.
12. A data retrieval system, the data retrieval system comprising: a search engine server and a database; the search engine server comprises an acquisition module, an index module and a result module;
the acquisition module is used for acquiring the field to be searched and the calculation function;
the index module is used for returning a data table row key according to the field to be searched and the calculation function and combining a pre-established index relation;
the result module is used for sending the data table row key to the database and receiving search result information returned by the database according to the data table row key;
the database is used for receiving the data table row key sent by the search engine server, retrieving the retrieval result information according to the data table row key, and returning the retrieval result information to the search engine server.
13. A data retrieval device comprising a memory, a processor and a data retrieval program stored on the memory and executable on the processor, the data retrieval program when executed by the processor implementing the steps of the data retrieval method according to any one of claims 1 to 11.
14. A computer-readable storage medium, characterized in that the computer-readable storage medium has stored thereon a data retrieval program which, when executed by a processor, implements the steps of the data retrieval method according to any one of claims 1 to 11.
CN202311052071.1A 2023-08-18 2023-08-18 Data retrieval method, system, device and storage medium Pending CN117093598A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311052071.1A CN117093598A (en) 2023-08-18 2023-08-18 Data retrieval method, system, device and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311052071.1A CN117093598A (en) 2023-08-18 2023-08-18 Data retrieval method, system, device and storage medium

Publications (1)

Publication Number Publication Date
CN117093598A true CN117093598A (en) 2023-11-21

Family

ID=88772933

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311052071.1A Pending CN117093598A (en) 2023-08-18 2023-08-18 Data retrieval method, system, device and storage medium

Country Status (1)

Country Link
CN (1) CN117093598A (en)

Similar Documents

Publication Publication Date Title
US10311055B2 (en) Global query hint specification
CN107784044B (en) Table data query method and device
CN101276361B (en) Method and system for displaying related key words
CN108959538B (en) Full text retrieval system and method
CN112800287B (en) Full-text indexing method and system based on graph database
WO2007085187A1 (en) Method of data retrieval, method of generating index files and search engine
CN111506621A (en) Data statistical method and device
CN110716952A (en) Multi-source heterogeneous data processing method and device and storage medium
CN105117433A (en) Method and system for statistically querying HBase based on analysis performed by Hive on HFile
CN107526762A (en) Service end, multi-data source searching method and system
CN110659283A (en) Data label processing method and device, computer equipment and storage medium
KR20170035349A (en) Method, device and terminal for data search
CN114297224A (en) RDF-based heterogeneous data integration and query system and method
CN108241709A (en) A kind of data integrating method, device and system
CN108319604B (en) Optimization method for association of large and small tables in hive
CN110515979B (en) Data query method, device, equipment and storage medium
CN111814020A (en) Data acquisition method and device
CN109697234B (en) Multi-attribute information query method, device, server and medium for entity
CN117093598A (en) Data retrieval method, system, device and storage medium
CN112579676A (en) Data processing method and device between heterogeneous systems, storage medium and equipment
CN103020300B (en) Method and device for information retrieval
CN117009430A (en) Data management method, device, storage medium and electronic equipment
CN115905274A (en) Data processing method and device, electronic equipment and medium
CN117539962B (en) Data processing method, device, computer equipment and storage medium
CN114490095B (en) Request result determination method and device, storage medium and electronic device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination