CN114969036A - Data retrieval method and device - Google Patents

Data retrieval method and device Download PDF

Info

Publication number
CN114969036A
CN114969036A CN202210573346.5A CN202210573346A CN114969036A CN 114969036 A CN114969036 A CN 114969036A CN 202210573346 A CN202210573346 A CN 202210573346A CN 114969036 A CN114969036 A CN 114969036A
Authority
CN
China
Prior art keywords
data
target
determining
index
key
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210573346.5A
Other languages
Chinese (zh)
Inventor
黄盖
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
WUHAN HONGXU INFORMATION TECHNOLOGY CO LTD
Original Assignee
WUHAN HONGXU INFORMATION TECHNOLOGY CO LTD
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by WUHAN HONGXU INFORMATION TECHNOLOGY CO LTD filed Critical WUHAN HONGXU INFORMATION TECHNOLOGY CO LTD
Priority to CN202210573346.5A priority Critical patent/CN114969036A/en
Publication of CN114969036A publication Critical patent/CN114969036A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures
    • G06F16/2282Tablespace storage structures; Management thereof
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2453Query optimisation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2455Query execution

Abstract

The invention provides a data retrieval method and a data retrieval device, wherein the method comprises the following steps: determining a target HBase data table based on the acquired target data; determining an elasticsearch index table of the target HBase data table based on the target HBase data table; and performing data retrieval based on the elastic search index table. According to the data retrieval method and device provided by the invention, the target HBase data table is established according to different data sources, the corresponding elastic search index table is established aiming at the target HBase data table, and the target HBase data table and the elastic search index table are associated by enabling the row keys of the target HBase data table to correspond to the index keys of the elastic search index table one by one, so that the query efficiency is improved, and the rapid query of mass data is realized.

Description

Data retrieval method and device
Technical Field
The invention relates to the technical field of big data, in particular to a data retrieval method and device.
Background
In recent years, with the wide application of data search, the activity of the Elasticsearch community is continuously increased, and the Elasticsearch gets a large development space, the role positioning of the Elasticsearch is not the original pure search engine, and the Elasticsearch engine which only takes a large number of users as logs is originally included. The Elasticsearch has now added the nature of data aggregation analysis (aggregation) and visualization, and the applications for its generation are becoming more and more widespread.
HBase is an open-source non-relational distributed database, which refers to the BigTable modeling of Google, and the realized programming language is Java. The Hadoop file system is part of a Hadoop item of an Apache software foundation, runs on an HDFS file system, and provides services similar to BigTable in scale for Hadoop. Therefore, it can store massive sparse data with fault tolerance. The HBase is a highly reliable, high-performance, column-oriented and telescopic distributed database, is an open-source implementation of Google BigTable, and is mainly used for storing unstructured and semi-structured loose data. The aim of HBase is to process very large tables, which can be processed in a horizontally expanded manner using inexpensive computer clusters, with data tables consisting of more than 10 million rows of data and millions of columns of elements.
The HBase queries can be divided into two categories, rowkey based queries and filter based filtering queries. However, both of the two query methods have limitations, and for rowkey query, the query request can be quickly responded, and the query conditions are not flexible enough; for the filter query, filters such as DependentColumnFilter, family filter, qualifiier filter, RowFilter, and ValueFilter are supported at present, the query performance is not high, and particularly for full-table scanning, the query efficiency is gradually reduced along with the increase of the data volume.
Disclosure of Invention
The invention provides a data retrieval method and a data retrieval device, which are used for solving the defect that mass data cannot be efficiently and quickly queried in the prior art and realizing the quick query of the mass data.
The invention provides a data retrieval method, which comprises the following steps:
determining a target HBase data table based on the acquired target data;
determining an elasticsearch index table of the target HBase data table based on the target HBase data table;
and performing data retrieval based on the elastic search index table.
In some embodiments, the determining a target HBase data table based on the collected target data comprises:
determining a row key of the target HBase data table based on identification information of target data; the identification information includes: time of record and key fields;
determining a data row of the target HBase data table based on the row key;
and determining a target HBase data table based on the row key and the data row.
In some embodiments, the determining an elasticsearch index table of the target HBase data table based on the target HBase data table comprises:
determining an index key of the elastic search index table based on a line key of the target HBase data table;
determining a data line of the elasticsearch index table based on a field of a target HBase data table;
determining the elastic search index table based on the index key and the data line of the elastic search index table.
In some embodiments, the determining an index key of the elasticsearch index table based on a row key of the target HBase data table comprises:
and determining the index key of the elastic search index table based on the row key, the partition start key, the index ID, the length of the index value, the index value and the timestamp of the maximum value of the target HBase data table.
In some embodiments, the retrieving data based on the elastic search index table includes:
traversing data rows in the elastic search index table based on a query condition, and determining an index key meeting the query condition;
determining a row key associated with the index key meeting the query condition based on the index key meeting the query condition;
and performing data retrieval in the target HBase data table based on the associated row key.
In some embodiments, before determining the target HBase data table based on the collected target data, the method further includes:
data are extracted according to the batch kafka removal queue and filtered to obtain a data set with a specified format;
analyzing the data in the data set to obtain original data;
cleaning the original data;
and extracting corresponding records from the cleaned data based on a preset HBase data table structure, and determining target data.
The present invention also provides a data retrieval apparatus comprising:
the first determination module is used for determining a target HBase data table based on the acquired target data;
a second determining module, configured to determine an elasticsearch index table of the target HBase data table based on the target HBase data table;
and the retrieval module is used for retrieving data based on the elastic search index table.
The invention also provides an electronic device comprising a memory, a processor and a computer program stored on the memory and operable on the processor, wherein the processor implements any of the data retrieval methods described above when executing the program.
The invention also provides a non-transitory computer readable storage medium having stored thereon a computer program which, when executed by a processor, implements a data retrieval method as described in any one of the above.
The invention also provides a computer program product comprising a computer program which, when executed by a processor, implements a data retrieval method as described in any one of the above.
According to the data retrieval method and device provided by the invention, the target HBase data table is established according to different data sources, the corresponding elastic search index table is established aiming at the target HBase data table, and the target HBase data table and the elastic search index table are associated by enabling the row keys of the target HBase data table to correspond to the index keys of the elastic search index table one by one, so that the API (application program interface) is enriched, the query efficiency is improved, and the rapid query of mass data is realized.
Drawings
In order to more clearly illustrate the technical solutions of the present invention or the prior art, the drawings needed for the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and those skilled in the art can also obtain other drawings according to the drawings without creative efforts.
FIG. 1 is a schematic flow chart of a data retrieval method provided by the present invention;
FIG. 2 is a schematic diagram of a data retrieval system provided by the present invention;
FIG. 3 is one of the flow diagrams of a data retrieval system provided by the present invention;
FIG. 4 is a second schematic flowchart of a data retrieval system according to the present invention;
FIG. 5 is a third schematic flow chart of a data retrieval system according to the present invention;
FIG. 6 is a fourth flowchart of a data retrieval system according to the present invention;
FIG. 7 is a schematic structural diagram of a data retrieval device provided in the present invention;
fig. 8 is a schematic structural diagram of an electronic device provided in the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention clearer, the technical solutions of the present invention will be clearly and completely described below with reference to the accompanying drawings, and it is obvious that the described embodiments are some, but not all embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
Fig. 1 is a schematic flow chart of a data retrieval method provided by the present invention, and referring to fig. 1, the data retrieval method provided by the present invention may include:
step 110, determining a target HBase data table based on the acquired target data;
step 120, determining an elasticsearch index table of the target HBase data table based on the target HBase data table;
and step 130, performing data retrieval based on the elastic search index table.
The execution subject of the data retrieval method provided by the invention can be an electronic device, a component in the electronic device, an integrated circuit, or a chip. The electronic device may be a mobile electronic device or a non-mobile electronic device. By way of example, the mobile electronic device may be a mobile phone, a tablet computer, a notebook computer, a palm top computer, a vehicle-mounted electronic device, a wearable device, an ultra-mobile personal computer (UMPC), a netbook or a Personal Digital Assistant (PDA), and the like, and the non-mobile electronic device may be a server, a Network Attached Storage (NAS), a personal computer (personal computer, PC), a Television (TV), a teller machine or a self-service machine, and the like, and the present invention is not limited in particular.
In step 110, a target HBase data table is determined based on the collected target data.
The data acquisition can be carried out by adopting a streaming data extraction mode, and the acquired target data can be different data sources. And establishing corresponding target HBase data table data tables according to different data sources according to the structure of a preset HBase data table.
HBase is a distributed, column-oriented database, which is the biggest difference from a general relational database: HBase is well suited to storing unstructured data based on a column rather than a row based pattern. HBase is stored by adopting Key value pairs KeyValue columns, and a row Key Rowkey is the Key Key of the KeyValue and represents a unique row. Rowkey is a binary code stream with a maximum length of 64KB and the content can be customized by the user using it. Data loading is also generally performed from small to large according to the Rowkey binary order.
In step 120, an elasticsearch index table of the target HBase data table is determined based on the target HBase data table.
And establishing a corresponding elastic search index table IndexTable according to the target HBase data table.
For example, the same initial key is designed between the Region of the target HBase data table and the corresponding elastic search index table, and the Region of the target HBase data table of the same data source is ensured to be in one-to-one correspondence and stored on the same Region Server by a control line key design mode, so that one RPC call is reduced during query. And by corresponding the row keys of the target HBase data table to the index keys of the elastic search index table one by one, the query API is enriched, and the query efficiency is improved.
In step 130, data retrieval is performed based on the elastic search index table.
When the HBase searches according to Rowkey, the system finds a Region where a certain Rowkey (or a certain Rowkey range) is located, and then routes the request for querying data to the Region to obtain the data. When HBase is retrieved and accessed through a single Rowkey, get operation is carried out according to a certain Rowkey key value, and thus a unique record is obtained; when scanning is performed by range of Rowkey, scanning is performed in this range by setting startRowKey and endRowKey. This allows a batch of records to be acquired under specified conditions.
And traversing and screening the data rows of the elastic search index table according to the input query condition, and finding the index key meeting the query condition. And associating a row key in the target HBase data table according to the index key. And querying data in the target HBase data table according to the row key to obtain query data.
According to the data retrieval method provided by the embodiment of the invention, the target HBase data table is established according to different data sources, the corresponding elastic search index table is established aiming at the target HBase data table, and the target HBase data table and the elastic search index table are associated by enabling the row keys of the target HBase data table to correspond to the index keys of the elastic search index table one by one, so that the API (application programming interface) is enriched, the query efficiency is improved, and the rapid query of mass data is realized.
In some embodiments, based on the collected target data, a target HBase data table is determined, including:
determining a row key of a target HBase data table based on the identification information of the target data; the identification information includes: time of record and key fields;
determining a data row of a target HBase data table based on a row key;
and determining a target HBase data table based on the row key and the data row.
Acquiring a Startkey of a preset HBase data table Region; acquiring the time of each record in the target data, and converting the time into a timestamp; acquiring unique identifiers formed by splicing key fields of records; and splicing the information to obtain the row key of the target HBase data table.
And creating a data row of the target HBase data table according to a row key of the target HBase data table, customizing the time stamp as the time stamp obtained in the last step, and filling data according to a preset HBase data table structure to obtain the target HBase data table.
According to the data retrieval method provided by the embodiment of the invention, the problem of disk expansion can be solved by using the HBase distributed database and the HDFS as a file system.
In some embodiments, determining an elasticsearch index table of the target HBase data table based on the target HBase data table includes:
determining an index key of an elasticsearch index table based on a line key of a target HBase data table;
determining a data line of an elastic search index table based on a field of a target HBase data table;
the elastic search index table is determined based on the index key and the data line of the elastic search index table.
And creating index keys of the index tables of the partition tables according to the row key information of the target HBase data table, so that the index keys of each index table correspond to the row keys of the target HBase data table one by one.
In some embodiments, determining an index key of the elasticsearch index table based on the row key of the target HBase data table comprises:
and determining an index key of the elasticsearch index table based on the row key, the partition start key, the index ID, the length of the index value, the index value and the timestamp of the maximum value of the target HBase data table.
Under the condition of single-column Index, the Index key sequentially comprises a partition start key, an Index ID Index _ ID, the length of an Index VALUE IndexValue _ length, an Index VALUE IndexValue, a time stamp MAX _ VALUE-time stamp of the maximum VALUE and a row key DataTable RowKey of a target HBase data table.
In the case of a multi-column index, the index values in the elastic search index table are obtained by splicing the column values.
Extracting partial fields needing to be indexed from the fields of the target HBase data table to create data lines of the elastic search index table, and accordingly constructing the elastic search index table.
According to the data retrieval method provided by the embodiment of the invention, the index key of the elastic search index table is determined based on the row key of the target HBase data table, the index key is obtained according to the row key according to a specific combination, the combination can ensure that the start section of the index key is the initial key of the Region of the target HBase data table, and the index key and the row key are in one-to-one correspondence, so that the correlation basis is established between the target HBase data table and the elastic search index table, and the rapid query of mass data is realized.
In some embodiments, based on the elastic search index table, data retrieval is performed, including:
traversing data rows in the elastic search index table based on the query condition, and determining an index key meeting the query condition;
determining a row key associated with the index key meeting the query condition based on the index keys meeting the query condition;
and performing data retrieval in the target HBase data table based on the associated row key.
Firstly, traversing and screening data rows of the elastic search index table according to an input query condition. And traversing all data rows in the elastic search index table meeting the query condition, and finding out all index keys meeting the query condition.
And associating a row key in the target HBase data table according to the index key meeting the query condition.
And inquiring data in the target HBase data table according to the row key to obtain an inquiry result, and returning the data.
According to the data retrieval method provided by the embodiment of the invention, the corresponding elastic search index table is established for the target HBase data table, the row keys correspond to the index keys one by one, the data rows of the elastic search index table are traversed according to the query conditions, all the index keys meeting the query conditions are found out, the row keys are associated according to the index keys, and the target HBase data table is queried according to the row keys to obtain the query data, so that technical support is provided for the mass data processing based on HBase, and the rapid query of the mass data is further realized.
In some embodiments, before determining the target HBase data table based on the collected target data, the method further includes:
data are extracted according to the batch kafka removal queue and filtered to obtain a data set with a specified format;
analyzing the data in the data set to obtain original data;
cleaning original data;
and extracting corresponding records from the cleaned data based on a preset HBase data table structure, and determining target data.
Firstly, data in the kafka queue is extracted in a streaming mode in real time in a spark-streaming mode, and filtering is carried out to obtain a data set with a specified format. The Spark Streaming is an extension of a Spark core API, and can realize the processing of real-time Streaming data with high throughput and a fault-tolerant mechanism. Spark Streaming supports the retrieval of data from a variety of data sources, including: kafka, Flume, Twitter, ZeroMQ, Kinesis, and TCP Sockets. After data is obtained from the data source, high-level functions such as map, reduce, join, and window can be used for processing of complex algorithms, and finally, the processing results can be stored in a file system, a database, and a field instrument panel.
And traversing the data set according to a preset data set analysis rule, traversing the specified directory and reserving the file matched with the rule. And judging the data format, wherein the data format is divided into structured data and JSON format data, and the fields are divided by using specified separators. And analyzing the data with the fixed format by adopting a fixed delimiter to obtain original data, and analyzing the data with the specific format by adopting a specific mode to obtain the original data.
And cleaning the original data, and filtering the records with missing key fields or timestamps larger than the current time.
And checking whether the number of the record fields in the cleaned data meets the requirement, and then extracting corresponding records according to a target HBase table to obtain target data.
According to the data retrieval method provided by the embodiment of the invention, after the extracted streaming data is subjected to data analysis, data cleaning, data extraction, data conversion and the like, inaccurate or damaged records in a data set, a table or a database are screened, detected and corrected, so that accurate target data is obtained.
Fig. 2 is a data retrieval system according to an embodiment of the present invention, and fig. 3 to 6 are one to four schematic flow diagrams of a data retrieval system according to the present invention.
Referring to fig. 2, a data retrieval system provided by an embodiment of the present invention may include: the system comprises a streaming data extraction module, a data analysis module, a data cleaning module, an ETL module, an HBase DataTable Row creation module, an elastic search IndexTable creation module, an HBase writing module, an elastic search writing module, an HBase cluster, an elastic search cluster and a data retrieval module.
The system comprises a streaming data extraction module, a data analysis module, a data cleaning module, an ETL module, an HBase DataTable Row creation module, an HBase writing module and an HBase cluster, wherein the streaming data extraction module, the data analysis module, the data cleaning module, the ETL module, the HBase DataTable Row creation module, the HBase writing module and the HBase cluster are sequentially connected;
the ETL module, the elastic search IndexTable creating module, the elastic search writing module and the elastic search cluster are sequentially connected;
the HBase cluster and the Elasticissearch cluster are both connected to the data retrieval module.
And the streaming data extraction module is used for extracting data according to the batch kafka-removing queue and filtering the data to obtain a data set with a specified format.
And the data analysis module is used for analyzing the data in the data set to obtain the original data.
The specific workflow of the data analysis module is shown in fig. 3:
s1, starting module initialization, providing data set analysis rules;
s2, traversing the data set, traversing the specified directory, and keeping the files matching the rules;
s3, judging the data format, wherein the data format is divided into structured data and JSON format data which are divided by using a specified separator between fields;
s4.1, analyzing the data with the fixed format, and analyzing the data with the fixed format by adopting a fixed delimiter to obtain original data;
and S4.2, analyzing the data in the specific format, and analyzing the data in the specific format in a specific mode to obtain original data.
And the data cleaning module is used for cleaning and filtering the records with missing key fields or timestamps greater than the current time.
And the ETL module is used for checking whether the number of the record fields meets the requirement or not and then extracting corresponding records according to a preset HBase table.
And the HBase DataTable Row creation module is used for creating the HBase DataTable.
The specific workflow of the HBase DataTable Row creation module is shown in fig. 4:
s1, starting the module, providing the table structure information of the corresponding DataTable;
s2, creating a DataTable Rowkey, acquiring a Startkey of a table Region, acquiring recorded time, converting the time into a timestamp, acquiring a unique identifier formed by splicing key fields of the record, and splicing the information to obtain the DataTable Row-key;
and S3, creating a DataTable Row, creating the DataTable Row by using the DataTable Rowkey acquired in S2, customizing the time stamp to be the time stamp acquired in S2, and filling data according to the DataTable table structure.
An Elasticissearch IndexTable creation module for creating an Elasticissearch IndexTable.
The specific workflow of the Elasticsearch indexttable creation module is shown in fig. 5:
s1, starting the module, and acquiring the table structure information of the corresponding IndexTable;
s2, acquiring Rowkey of the DataTable Region, and acquiring Rowkey information of the DataTable;
s3, creating IndexTable keys, and creating IndexTable of the partition table according to Rowkey information of the DataTable Region, so that the key of each IndexTable corresponds to the Rowkey of the DataTable. If the key is a single-column Index, the key components are sequentially a Region StartKey, an Index _ id, an IndexValue _ length, an IndexValue, a MAX _ VALUE-timetag, and a DataTable RowKey. If the index is a multi-column index, IndexValue is obtained by splicing the column values;
s4, creating IndexTable Rows, and extracting partial fields needing to be indexed from the fields of the DataTable to create IndexTable Rows.
And the HBase writing module is used for writing HBase in batches and setting the batch size.
And the Elasticissearch writing module is used for writing Elasticissearch in batches and setting the batch size.
And the data retrieval module is used for rapidly retrieving data in the HBase cluster and the Elasticissearch cluster. The data in the data satisfies the one-to-one correspondence between IndexTable Key and RowKey of DataTable.
The specific work flow of the data retrieval module is shown in fig. 6:
s1, start, module initialization. Inputting a query condition;
s2, traversing and screening Row in the IndexTable, determining the query condition, traversing all rows in the IndexTable meeting the query condition, and finding out all keys meeting the query condition;
s3, searching Row in the DataTable, associating Rowkey in the DataTable according to the key obtained in S2, and obtaining data in the DataTable;
and S4, returning data, constructing a data entity needing to be returned, and returning the data.
According to the data retrieval system provided by the embodiment of the invention, the problem of disk expansion is solved by using the HBase distributed database and the HDFS as a file system; by using the secondary index, the retrieval speed is high; the DataTable RowKey corresponds to the IndexTable keys one to one, the query API is enriched, the query efficiency is improved, effective technical support can be provided for processing and querying mass data of the mobile internet, and rapid query of the mass data is realized.
The data retrieval device provided by the present invention is described below, and the data retrieval device described below and the data retrieval method described above may be referred to correspondingly.
Fig. 7 is a schematic structural diagram of a data retrieval device according to the present invention, and referring to fig. 7, the data retrieval device according to the present invention may include:
a first determining module 710, configured to determine a target HBase data table based on the acquired target data;
a second determining module 720, configured to determine an elasticsearch index table of the target HBase data table based on the target HBase data table;
and the retrieval module 730 is used for retrieving data based on the elastic search index table.
According to the data retrieval device provided by the embodiment of the invention, the target HBase data table is established according to different data sources, the corresponding elastic search index table is established aiming at the target HBase data table, and the target HBase data table and the elastic search index table are associated by enabling the row keys of the target HBase data table to correspond to the index keys of the elastic search index table one by one, so that the API (application programming interface) is enriched, the query efficiency is improved, and the rapid query of mass data is realized.
In some embodiments, the first determining module 710 is specifically configured to:
determining a row key of a target HBase data table based on the identification information of the target data; the identification information includes: time of record and key fields;
determining a data row of a target HBase data table based on a row key;
and determining a target HBase data table based on the row key and the data row.
In some embodiments, the second determining module 720 is specifically configured to:
determining an index key of an elasticsearch index table based on a line key of a target HBase data table;
determining a data line of an elastic search index table based on a field of a target HBase data table;
the elastic search index table is determined based on the index key and the data lines of the elastic search index table.
In some embodiments, determining an index key of the elasticsearch index table based on the row key of the target HBase data table comprises:
and determining an index key of the elasticsearch index table based on the row key, the partition start key, the index ID, the length of the index value, the index value and the timestamp of the maximum value of the target HBase data table.
In some embodiments, the retrieving module 730 is specifically configured to:
traversing data rows in the elastic search index table based on the query condition, and determining an index key meeting the query condition;
determining a row key associated with the index key meeting the query condition based on the index keys meeting the query condition;
and performing data retrieval in the target HBase data table based on the associated row key.
In some embodiments, before determining the target HBase data table based on the collected target data, the method further includes:
data are extracted according to the batch kafka removal queue and filtered to obtain a data set with a specified format;
analyzing the data in the data set to obtain original data;
cleaning the original data;
and extracting corresponding records from the cleaned data based on a preset HBase data table structure, and determining target data.
Fig. 8 illustrates a physical structure diagram of an electronic device, and as shown in fig. 8, the electronic device may include: a processor (processor)810, a communication Interface 820, a memory 830 and a communication bus 840, wherein the processor 810, the communication Interface 820 and the memory 830 communicate with each other via the communication bus 840. The processor 810 may call logical instructions in the memory 830 to perform a data retrieval method comprising:
determining a target HBase data table based on the acquired target data;
determining an elasticsearch index table of the target HBase data table based on the target HBase data table;
and performing data retrieval based on the elastic search index table.
In addition, the logic instructions in the memory 830 may be implemented in software functional units and stored in a computer readable storage medium when the logic instructions are sold or used as independent products. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.
In another aspect, the present invention also provides a computer program product, the computer program product comprising a computer program, the computer program being storable on a non-transitory computer-readable storage medium, the computer program, when executed by a processor, being capable of executing the data retrieval method provided by the above methods, the method comprising:
determining a target HBase data table based on the acquired target data;
determining an elasticsearch index table of the target HBase data table based on the target HBase data table;
and performing data retrieval based on the elastic search index table.
In yet another aspect, the present invention also provides a non-transitory computer-readable storage medium having stored thereon a computer program which, when executed by a processor, implements a method for data retrieval provided by the above methods, the method comprising:
determining a target HBase data table based on the acquired target data;
determining an elasticsearch index table of the target HBase data table based on the target HBase data table;
and performing data retrieval based on the elastic search index table.
The above-described embodiments of the apparatus are merely illustrative, and the units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of this embodiment. One of ordinary skill in the art can understand and implement it without inventive effort.
Through the above description of the embodiments, those skilled in the art will clearly understand that each embodiment can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware. Based on the understanding, the above technical solutions substantially or otherwise contributing to the prior art may be embodied in the form of a software product, which may be stored in a computer-readable storage medium, such as ROM/RAM, magnetic disk, optical disk, etc., and includes several instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the method according to the various embodiments or some parts of the embodiments.
Finally, it should be noted that: the above examples are only intended to illustrate the technical solution of the present invention, but not to limit it; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present invention.

Claims (10)

1. A method of data retrieval, comprising:
determining a target HBase data table based on the acquired target data;
determining an elasticsearch index table of the target HBase data table based on the target HBase data table;
and performing data retrieval based on the elastic search index table.
2. The data retrieval method of claim 1, wherein determining a target HBase data table based on the collected target data comprises:
determining a row key of the target HBase data table based on identification information of target data; the identification information includes: time of record and key fields;
determining a data row of the target HBase data table based on the row key;
and determining a target HBase data table based on the row key and the data row.
3. The data retrieval method of claim 2, wherein the determining an elasticsearch index table of the target HBase data table based on the target HBase data table comprises:
determining an index key of the elasticsearch index table based on a row key of the target HBase data table;
determining a data line of the elasticsearch index table based on a field of a target HBase data table;
determining the elastic search index table based on the index key and the data line of the elastic search index table.
4. The data retrieval method of claim 3, wherein the determining the index key of the elasticsearch index table based on the row key of the target HBase data table comprises:
and determining the index key of the elastic search index table based on the row key, the partition start key, the index ID, the length of the index value, the index value and the timestamp of the maximum value of the target HBase data table.
5. The data retrieval method of claim 3, wherein the retrieving data based on the elastic search index table comprises:
traversing data rows in the elastic search index table based on a query condition, and determining an index key meeting the query condition;
determining a row key associated with the index key meeting the query condition based on the index key meeting the query condition;
and performing data retrieval in the target HBase data table based on the associated row key.
6. The data retrieval method of claim 1, wherein before determining the target HBase data table based on the collected target data, further comprising:
data are extracted according to the batch kafka removal queue and filtered to obtain a data set with a specified format;
analyzing the data in the data set to obtain original data;
cleaning the original data;
and extracting corresponding records from the cleaned data based on a preset HBase data table structure, and determining target data.
7. A data retrieval device, comprising:
the first determination module is used for determining a target HBase data table based on the acquired target data;
a second determining module, configured to determine an elasticsearch index table of the target HBase data table based on the target HBase data table;
and the retrieval module is used for retrieving data based on the elastic search index table.
8. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor implements the data retrieval method of any one of claims 1 to 6 when executing the program.
9. A non-transitory computer readable storage medium having stored thereon a computer program, wherein the computer program, when executed by a processor, implements the data retrieval method of any one of claims 1 to 6.
10. A computer program product comprising a computer program, wherein the computer program, when executed by a processor, implements a data retrieval method as claimed in any one of claims 1 to 6.
CN202210573346.5A 2022-05-24 2022-05-24 Data retrieval method and device Pending CN114969036A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210573346.5A CN114969036A (en) 2022-05-24 2022-05-24 Data retrieval method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210573346.5A CN114969036A (en) 2022-05-24 2022-05-24 Data retrieval method and device

Publications (1)

Publication Number Publication Date
CN114969036A true CN114969036A (en) 2022-08-30

Family

ID=82956424

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210573346.5A Pending CN114969036A (en) 2022-05-24 2022-05-24 Data retrieval method and device

Country Status (1)

Country Link
CN (1) CN114969036A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115658730A (en) * 2022-09-20 2023-01-31 中国科学院自动化研究所 Sparse data query method, device, equipment and computer readable storage medium

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115658730A (en) * 2022-09-20 2023-01-31 中国科学院自动化研究所 Sparse data query method, device, equipment and computer readable storage medium
CN115658730B (en) * 2022-09-20 2024-02-13 中国科学院自动化研究所 Sparse data query method, apparatus, device and computer readable storage medium

Similar Documents

Publication Publication Date Title
CN110109923B (en) Time sequence data storage method, time sequence data analysis method and time sequence data analysis device
US10452691B2 (en) Method and apparatus for generating search results using inverted index
CN110362544B (en) Log processing system, log processing method, terminal and storage medium
AU2013329525C1 (en) System and method for recursively traversing the internet and other sources to identify, gather, curate, adjudicate, and qualify business identity and related data
US10216848B2 (en) Method and system for recommending cloud websites based on terminal access statistics
US9710468B2 (en) Topic profile query creation
CN107729399B (en) Data processing method and device
CN106407360B (en) Data processing method and device
WO2017096892A1 (en) Index construction method, search method, and corresponding device, apparatus, and computer storage medium
WO2018036549A1 (en) Distributed database query method and device, and management system
CN112988863A (en) Elasticissearch-based efficient search engine method for heterogeneous multiple data sources
US20210294865A1 (en) System and method for updating a search index
CN111221791A (en) Method for importing multi-source heterogeneous data into data lake
CN110727663A (en) Data cleaning method, device, equipment and medium
KR101892067B1 (en) Method for storing and searching of text logdata based relational database
RU2568276C2 (en) Method of extracting useful content from mobile application setup files for further computer data processing, particularly search
CN112800287A (en) Full-text indexing method and system based on graph database
CN112100138A (en) Log query method and device, storage medium and electronic equipment
CN116881287A (en) Data query method and related equipment
CN114969036A (en) Data retrieval method and device
CN111026709A (en) Data processing method and device based on cluster access
JP2013041385A (en) Document retrieval method, document retrieval device, and document retrieval program
CN106844553B (en) Data detection and expansion method and device based on sample data
US20230153455A1 (en) Query-based database redaction
CN111460257A (en) Thematic generation method and device, electronic equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination