CN107506464A - A kind of method that HBase secondary indexs are realized based on ES - Google Patents

A kind of method that HBase secondary indexs are realized based on ES Download PDF

Info

Publication number
CN107506464A
CN107506464A CN201710763058.5A CN201710763058A CN107506464A CN 107506464 A CN107506464 A CN 107506464A CN 201710763058 A CN201710763058 A CN 201710763058A CN 107506464 A CN107506464 A CN 107506464A
Authority
CN
China
Prior art keywords
data
secondary index
index table
line unit
query
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Withdrawn
Application number
CN201710763058.5A
Other languages
Chinese (zh)
Inventor
雷万钧
于起超
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Wuhan Fiberhome Digtal Technology Co Ltd
Original Assignee
Wuhan Fiberhome Digtal Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Wuhan Fiberhome Digtal Technology Co Ltd filed Critical Wuhan Fiberhome Digtal Technology Co Ltd
Priority to CN201710763058.5A priority Critical patent/CN107506464A/en
Publication of CN107506464A publication Critical patent/CN107506464A/en
Withdrawn legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures
    • G06F16/2228Indexing structures
    • G06F16/2255Hash tables

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Software Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a kind of method that HBase secondary indexs are realized based on ES, it is related to big data technical field.This method is:1. being listed according to inquiry business to related data in ES and establishing secondary index table, the corresponding secondary index table of a basic query business, a complex query business corresponds to multiple secondary index tables;2. inquire about, the line unit for obtaining corresponding data is inquired about according to concordance list first, data are obtained further according to line unit inquiry tables of data;3., it is necessary to secondary index table update simultaneously corresponding to when updating the data table related column.The introducing that the present invention passes through ES distributed search engines, each data manipulation, only very few several Region, fundamentally reduce the pressure of cluster, the burden of network service is alleviated, makes the dependence reduction to high-performance server, enhances the efficiency and stability of work, and possess preferable scalability, have good value for applications.

Description

A kind of method that HBase secondary indexs are realized based on ES
Technical field
The present invention relates to big data technical field, more particularly to a kind of method that HBase secondary indexs are realized based on ES.
Background technology
With the arrival in big data epoch, geometric growth is presented in public security system data volume, and mass data is to traditional database Technology proposes storage and the challenge of retrieval performance, and the data statistics difficulty of each dimension also becomes big therewith.It is traditional at present to be By writing MapReduce or the method using instruments such as Hive, Pig, conventional method is that full table is scanned, to cluster Performance consumption and the occupancy of network bandwidth are larger, are not applied under the scene of ultra-large data volume.It is only hard by upgrading physics Part or Optimized code, do not adapt to the growth rate of information and the demand of information processing efficiency, and researcher starts to explore New data statistical approach.How to solve this problem turns into difficult point.
The HBase databases run in Hadoop platform be a high reliability, high-performance, towards row and it is expansible Distributed memory system.HBase is that one kind is increased income NoSQL databases, is suitable for various unstructured and semi-structured loose The storage and management of data, large-scale storage cluster can be erected on low-cost server cluster using HBase database technologys, It disclosure satisfy that the storage demand of public security big data.But the big data storage scheme based on HBase is not fully solved data Efficient retrieval problem.In actual applications, it is often necessary to retrieval is combined according to specific field, or several fields, especially It is in face of public security big data is complicated, flexible inquiry business demand, and single line unit can not necessarily meet service inquiry needs, because A kind of this urgently big data search method that disclosure satisfy that needs.
ES full name ElasticSearch, it can establish and index convenient for data, an index can be divided into multiple ropes Drawing burst, (index burst number can be specified by user, be defaulted as 5), multiple bursts are balancedly then distributed in into all of cluster can With on node, distributed frame is formed, alleviates the burden of individual node.Can also be every in ElasticSearch clusters It is individual index burst set copy (number of copies still can voluntarily be specified by user, be defaulted as 1), when certain index burst failure when, Copy can be timely used and recover data.ElasticSearch also possesses automatic discovery Node Mechanism and fast data recovery machine System, when there is new node to add cluster, ElasticSearch can in time have found and re-start load balancing automatically, for new section Point distribution data;When certain node failure, it equally can distribute data for enabled node again automatically.
The content of the invention
The purpose of the present invention is that the above mentioned problem solved existing for prior art, there is provided one kind is realized based on ES The method of HBase secondary indexs.
The object of the present invention is achieved like this:
Specifically, this method comprises the following steps:
1. related data is listed in ES according to inquiry business and establishes secondary index table, a basic query business corresponding one Secondary index table is opened, a complex query business corresponds to multiple secondary index tables;
A, according to action type, secondary index table is created in ES
For selecting inquiry operation, the M data row for being related to selection inquiry are respectively stored into M secondary index table, Wherein, M is more than or equal to 1, and the line unit R of each secondary index table is formed by three parts, is successively:QUALIFIER、VALUE And ROEKEY;Wherein QUALIFIER is the identifier that data arrange in tables of data, and VALUE is the value that data arrange in tables of data, ROWKEY is the line unit of tables of data;
B, according to data column-generation secondary index entry and secondary index table is inserted
Operated for connection Query, the N number of data row for being related to connection Query are stored into a secondary index table, its In, N is more than or equal to 2, and the line unit R of secondary index table is made up of three parts, is successively:PREFIX、VALUE、QUALIFIER;Its Middle PREFIX is generated by hash function, and for distinguishing the group of connection Query, VALUE is the value that data arrange in tables of data, QUALIFIER is the identifier that data arrange in tables of data;
The value that data arrange in the secondary index table is the ROWKEY of corresponding data table;Data arrange in the secondary index table Value and the line unit R of secondary index table collectively form an entry of secondary index table;Secondary index table is created in ES, and will The incidence relation that data arrange corresponding secondary index table is stored into metadata table, and the line unit of metadata table, which is formed, to be followed successively by: Table name, row Praenomen, row name, the action type of secondary index table, the timestamp of tables of data, value corresponding to the line unit of metadata table For:The action type and secondary index table name of secondary index table;
The action type of secondary index table includes:Select inquiry operation and connection Query operation;
2. inquire about, the line unit for obtaining corresponding data is inquired about according to concordance list first, data are inquired about further according to line unit Table obtains data;
A, the line unit that secondary index table obtains data to be checked is scanned;
Each data in the M data row being related to for selection inquiry business are arranged, and first number is inquired about according to action type According to table, the title of secondary index table corresponding to acquisition;The secondary index table is looked into, specific query process is:Inquired about according to selection In condition value directly position to first qualified data, continue to scan on, until find an ineligible number According to;Scanned qualified data composition meets the ROWKEY of the querying condition of current data row set;If M etc. In 1, then ROWKEY set is the ROWKEY of data to be checked set;If M is more than 1, according to M data in inquiry business Corresponding set operation is done in logical relation in row, the ROWKEY set to different lines:Logical AND corresponds to the operation of intersection of sets collection, Logic or corresponding union operation, the result of computing is the ROWKEY of data to be checked set;
B, using data to be checked ROWKEY collection query tables of data
Arranged for N number of data that connection Query business is related to, two according to corresponding to obtaining action type query metadata table The title of level concordance list, N number of corresponding same secondary index table of row;The secondary index table is inquired about, specific query process is:Root Understood according to secondary index table row key form, it is continuous that N number of data with identical value are listed in corresponding entry in secondary index table Arrangement;If the number of the continuously arranged directory entry with identical data train value is N, the ROWKEY of N number of entry is formed One N tuple for meeting querying condition<R1, R2 ..., RN>;Scan whole secondary index table, then obtain all conditions that meet N tuples set<R1, R2 ..., RN>, then gather<R1,R2,...,RN>Be exactly data to be checked ROWKEY collection Close;The ROWKEY of the data to be checked obtained set is obtained corresponding by the HBase Get interface methods provided in tables of data Data value;
3., it is necessary to secondary index table update simultaneously corresponding to when updating the data table related column
Judge whether tables of data has renewal, if so, just renewal secondary index table, if not having, does not update secondary index table;
The method of renewal secondary index table comprises the following steps:
I, update the data table:The Put method interfaces provided by the HBase in Hadoop platform, the value of submission data row, The identifier of line unit, row race and row, the renewal of complete paired data table;
II, generation secondary index entry:For the row of the data currently updated, query metadata table, acquisition needs to update Secondary index table and secondary index table corresponding to action type, the lattice of corresponding secondary index table are selected according to action type Formula, meet the tabular entry of corresponding secondary index using the data message generation updated in tables of data;
III, renewal secondary index table:The interface method provided by Coprocessor in the HBase in Hadoop platform, The value of the form submission secondary index table of the secondary index entry generated according to step II, line unit, the identifier for arranging race and row, it is complete The renewal of paired secondary index table.
This method can realize basic renewal operation in the case where not causing larger pressure to Hadoop clusters, and The connection Query and selection inquiry operation between tables of data can be relatively efficiently realized for each specific business, so as to real Now to complexity business demand support and to it is daily increase newly data counted with total amount.
This method has following features:
1) secondary index table creates simple;
2) index file writes simultaneously with data file, ensures uniformity;
3) the data statistics time greatly reduces.
The present invention has following advantages and good effect:
By the introducing of ES distributed search engines, each data manipulation, only very few several Region, from basic On reduce the pressure of cluster, alleviate the burden of network service, make the dependence reduction to high-performance server, enhance work The efficiency and stability of work, and possess preferable scalability, have good value for applications.
Brief description of the drawings
Fig. 1 is the overview flow chart of this method;
Fig. 2 is the selection querying flow figure of this method step 2.;
Fig. 3 is the connection Query flow chart of this method step 2..
English to Chinese:
1、ES:Full name ElasticSearch is increasing income based on Lucene structures, distributed, and RESTful search is drawn Hold up.It is stable designed for real-time search in cloud computing, can be reached, it is reliably, quickly, easy to install.Support passes through HTTP Data directory is carried out using JSON.
We establish a website or application program, and to add function of search, make that we are stricken to be:Search work It is difficult.It is desirable that our search solution is fast, it is intended that have a zero configuration and one it is completely free Search pattern, it is therefore desirable to be able to the index data for simply passing through HTTP using JSON, it is intended that our search service Device can use all the time, it is therefore desirable to be able to which one starts and expands to hundreds of, and we will search in real time, and we simply will rent more Family, it is intended that establish the solution of a cloud.Elasticsearch aims to solve the problem that all these problems and more.
2、HBase:It is the non-relational an increased income distributed data base (NoSQL), it with reference to Google BigTable is modeled, and the programming language of realization is Java.It is a part for Apache Software Foundation Hadoop projects, operation On HDFS file system, the service similar to BigTable scales is provided for Hadoop.HBase is realized on row Compression algorithm, internal memory operation and the Bloom filter that BigTable papers are mentioned.HBase table can appoint as MapReduce The input and output of business, data can be accessed by Java API, REST, Avro or Thrift API can also be passed through To access.Although HBase performances are obviously improved, it can't directly substitute SQL database.It has been applied to more now Individual data driven type website.
Embodiment
With reference to the accompanying drawings and examples to the detailed description of the invention:
1st, method (totality)
Such as Fig. 1, overall procedure is:
Secondary index table is established according to the row of index first, then first judges to update the data or look into when calling Ask data;
If updating the data, then secondary index table is updated while table is updated the data;
If operation is inquires about, the data of secondary index are inquired about first, and the key assignments being retrieved according to secondary index obtains The related data row of tables of data.
2nd, step is 2.
1) selection inquiry
Such as Fig. 2, selecting the workflow of inquiry is:
For a compound selection inquiry business, the compound selection querying condition of business is split as single query bar first Part, the entry set for meeting single condition is then obtained by the line unit of concordance list, will finally meet the entry of each single condition Set carries out set operation, you can obtains all secondary index entries for meeting compound query condition, then is carried from these entries Take all qualified tables of data line units;Wherein, obtain meet the secondary index bar destination aggregation (mda) of single condition when, can be according to Directly position to first qualified data according to the line unit of concordance list, down scan, until discovery one is ineligible Data, then scanned entry is merged into the secondary index bar destination aggregation (mda) for meeting single condition.
2) connection Query
Such as Fig. 3, the workflow of connection Query is:
For compound connection Query business, inquiry can be divided into two connection Query groups, the number of same connection Query group When being inserted into according to row in concordance list, identical PREFIX values are produced by hash function;Value corresponding to line unit R is then that this is listed in data Line unit in table;Whole scan is carried out to secondary index table during inquiry, records qualified multi-component system set, then these are more Tuple-set carries out set operation, obtains the line unit value of eligible data;Wherein recording qualified multi-component system set During, when the multi-component system of only continuous entry composition can meet the condition of connection Query group, just this multi-component system is added Add in multi-component system set.

Claims (3)

  1. A kind of 1. method that HBase secondary indexs are realized based on ES, it is characterised in that:
    1. being listed according to inquiry business to related data in ES and establishing secondary index table, a basic query business is corresponding one two Level concordance list, a complex query business correspond to multiple secondary index tables;
    A, according to action type, secondary index table is created in ES
    For selecting inquiry operation, the M data row for being related to selection inquiry are respectively stored into M secondary index table, wherein, M is more than or equal to 1, and the line unit R of each secondary index table is formed by three parts, is successively:QUALIFIER, VALUE and ROEKEY;Wherein QUALIFIER be in tables of data data arrange identifier, VALUE be in tables of data data arrange value, ROWKEY It is the line unit of tables of data;
    B, according to data column-generation secondary index entry and secondary index table is inserted
    Operated for connection Query, the N number of data row for being related to connection Query are stored into a secondary index table, wherein, N is big In equal to 2, the line unit R of secondary index table is made up of three parts, is successively:PREFIX、VALUE、QUALIFIER;Wherein PREFIX is generated by hash function, and for distinguishing the group of connection Query, VALUE is the value that data arrange in tables of data, QUALIFIER It is the identifier that data arrange in tables of data;
    The value that data arrange in the secondary index table is the ROWKEY of corresponding data table;The value that data arrange in the secondary index table An entry of secondary index table is collectively formed with the line unit R of secondary index table;Secondary index table is created in ES, and by data The incidence relation for arranging corresponding secondary index table is stored into metadata table, and the line unit of metadata table, which is formed, to be followed successively by:Data Table name, row Praenomen, row name, the action type of secondary index table, the timestamp of table, value corresponding to the line unit of metadata table are:Two The action type and secondary index table name of level concordance list;
    The action type of secondary index table includes:Select inquiry operation and connection Query operation;
    2. inquire about, the line unit for obtaining corresponding data is inquired about according to concordance list first, is obtained further according to line unit inquiry tables of data Obtain data;
    A, the line unit that secondary index table obtains data to be checked is scanned;
    Each data row in the M data row being related to for selection inquiry business, according to action type query metadata table, The title of secondary index table corresponding to acquisition;The secondary index table is looked into, specific query process is:Bar in being inquired about according to selection Part value is directly positioned to first qualified data, is continued to scan on, until finding an ineligible data;Scanning The qualified data composition crossed meets the ROWKEY of the querying condition of current data row set;If M is equal to 1, ROWKEY set is the ROWKEY of data to be checked set;If M is more than 1, according in M data row in inquiry business Corresponding set operation is done in logical relation, the ROWKEY set to different lines:Logical AND correspond to intersection of sets collection operation, logic or Corresponding union operation, the result of computing is the ROWKEY of data to be checked set;
    B, using data to be checked ROWKEY collection query tables of data
    Arranged for N number of data that connection Query business is related to, the two level rope according to corresponding to obtaining action type query metadata table Draw the title of table, N number of corresponding same secondary index table of row;The secondary index table is inquired about, specific query process is:According to two Level concordance list line unit form understands that N number of data with identical value are listed in corresponding entry continuous arrangement in secondary index table; If the number of the continuously arranged directory entry with identical data train value is N, the ROWKEY of N number of entry forms one completely The N tuples of sufficient querying condition<R1, R2 ..., RN>;Whole secondary index table is scanned, then obtains all N tuples for meeting condition Set<R1, R2 ..., RN>, then gather<R1,R2,...,RN>Be exactly data to be checked ROWKEY set; Data to be checked ROWKEY set by HBase provide Get interface methods obtained in tables of data corresponding to number According to value;
    3., it is necessary to secondary index table update simultaneously corresponding to when updating the data table related column
    Judge whether tables of data has renewal, if so, just renewal secondary index table, if not having, does not update secondary index table;
    The method of renewal secondary index table comprises the following steps:
    I, update the data table:The Put method interfaces provided by HBase in Hadoop platform, submit the value, OK of data row The identifier of key, row race and row, the renewal of complete paired data table;
    II, generation secondary index entry:For the row of the data currently updated, query metadata table, need to update two are obtained Action type corresponding to level concordance list and secondary index table, corresponding secondary index tableau format is selected according to action type, Meet the tabular entry of corresponding secondary index using the data message generation updated in tables of data;
    III, renewal secondary index table:The interface method provided by Coprocessor in the HBase in Hadoop platform, according to The value of the form submission secondary index table of the secondary index entry of step b generations, line unit, the identifier for arranging race and row, completion pair The renewal of secondary index table.
  2. 2. a kind of method that HBase secondary indexs are realized based on ES as described in claim 1, it is characterised in that the step is 2. Its select inquiry workflow be:
    For a compound selection inquiry business, the compound selection querying condition of business is split as single query condition first, Then the entry set for meeting single condition is obtained by the line unit of concordance list, will finally meet the entry set of each single condition Carry out set operation, you can obtain all secondary index entries for meeting compound query condition, then institute is extracted from these entries There is qualified tables of data line unit;Wherein, can be according to rope when acquisition meets the secondary index bar destination aggregation (mda) of single condition Draw the line unit directly positioning of table to first qualified data, down scan, until finding an ineligible number According to then scanned entry to be merged into the secondary index bar destination aggregation (mda) for meeting single condition.
  3. 3. a kind of method that HBase secondary indexs are realized based on ES as described in claim 1, it is characterised in that the step is 2. The workflow of its connection Query is:
    For compound connection Query business, inquiry can be divided into two connection Query groups, the data row of same connection Query group When being inserted into concordance list, identical PREFIX values are produced by hash function;Value corresponding to line unit R is then that this is listed in tables of data Line unit;Whole scan is carried out to secondary index table during inquiry, records qualified multi-component system set, then by these multi-component systems Set carries out set operation, obtains the line unit value of eligible data;Wherein recording qualified multi-component system aggregation process In, when the multi-component system of only continuous entry composition can meet the condition of connection Query group, just this multi-component system is added to In multi-component system set.
CN201710763058.5A 2017-08-30 2017-08-30 A kind of method that HBase secondary indexs are realized based on ES Withdrawn CN107506464A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710763058.5A CN107506464A (en) 2017-08-30 2017-08-30 A kind of method that HBase secondary indexs are realized based on ES

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710763058.5A CN107506464A (en) 2017-08-30 2017-08-30 A kind of method that HBase secondary indexs are realized based on ES

Publications (1)

Publication Number Publication Date
CN107506464A true CN107506464A (en) 2017-12-22

Family

ID=60694149

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710763058.5A Withdrawn CN107506464A (en) 2017-08-30 2017-08-30 A kind of method that HBase secondary indexs are realized based on ES

Country Status (1)

Country Link
CN (1) CN107506464A (en)

Cited By (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109271097A (en) * 2017-12-28 2019-01-25 新华三大数据技术有限公司 Data processing method, data processing equipment and server
CN109299102A (en) * 2018-10-23 2019-02-01 中国电子科技集团公司第二十八研究所 A kind of HBase secondary index system and method based on Elastcisearch
CN109299110A (en) * 2018-11-09 2019-02-01 东软集团股份有限公司 Data query method, apparatus, storage medium and electronic equipment
CN109800222A (en) * 2018-12-11 2019-05-24 中国科学院信息工程研究所 A kind of HBase secondary index adaptive optimization method and system
CN110502524A (en) * 2019-08-15 2019-11-26 济南浪潮数据技术有限公司 Phoenix index data asynchronous updating method and device
CN110737692A (en) * 2018-07-19 2020-01-31 杭州海康威视数字技术股份有限公司 data retrieval method, index database establishment method and device
CN111159185A (en) * 2019-12-27 2020-05-15 紫光云(南京)数字技术有限公司 Hive index method based on conditional push-down elastic search
CN111753045A (en) * 2020-07-01 2020-10-09 浪潮云信息技术股份公司 Hive secondary full-text index technical method and system based on elastic search
CN112597191A (en) * 2020-12-29 2021-04-02 拉卡拉支付股份有限公司 Data processing method, data processing apparatus, electronic device, storage medium, and program product
CN112805695A (en) * 2019-03-20 2021-05-14 谷歌有限责任公司 Co-sharding and randomized co-sharding
CN114372064A (en) * 2022-03-22 2022-04-19 飞狐信息技术(天津)有限公司 Data processing apparatus, method, computer readable medium and processor
WO2024022180A1 (en) * 2022-07-28 2024-02-01 天津联想协同科技有限公司 Network disk document indexing method and apparatus, and network disk and storage medium

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104834688A (en) * 2015-04-20 2015-08-12 北京奇艺世纪科技有限公司 Secondary index establishment method and device
CN106503243A (en) * 2016-11-08 2017-03-15 国网山东省电力公司电力科学研究院 Electric power big data querying method and system based on HBase secondary indexs
CN106682073A (en) * 2016-11-14 2017-05-17 上海轻维软件有限公司 HBase fuzzy retrieval system based on Elastic Search

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104834688A (en) * 2015-04-20 2015-08-12 北京奇艺世纪科技有限公司 Secondary index establishment method and device
CN106503243A (en) * 2016-11-08 2017-03-15 国网山东省电力公司电力科学研究院 Electric power big data querying method and system based on HBase secondary indexs
CN106682073A (en) * 2016-11-14 2017-05-17 上海轻维软件有限公司 HBase fuzzy retrieval system based on Elastic Search

Cited By (17)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109271097A (en) * 2017-12-28 2019-01-25 新华三大数据技术有限公司 Data processing method, data processing equipment and server
CN110737692A (en) * 2018-07-19 2020-01-31 杭州海康威视数字技术股份有限公司 data retrieval method, index database establishment method and device
CN109299102B (en) * 2018-10-23 2020-11-13 中国电子科技集团公司第二十八研究所 HBase secondary index system and method based on Elastcissearch
CN109299102A (en) * 2018-10-23 2019-02-01 中国电子科技集团公司第二十八研究所 A kind of HBase secondary index system and method based on Elastcisearch
CN109299110A (en) * 2018-11-09 2019-02-01 东软集团股份有限公司 Data query method, apparatus, storage medium and electronic equipment
CN109800222A (en) * 2018-12-11 2019-05-24 中国科学院信息工程研究所 A kind of HBase secondary index adaptive optimization method and system
CN109800222B (en) * 2018-12-11 2021-06-01 中国科学院信息工程研究所 HBase secondary index self-adaptive optimization method and system
CN112805695A (en) * 2019-03-20 2021-05-14 谷歌有限责任公司 Co-sharding and randomized co-sharding
CN110502524A (en) * 2019-08-15 2019-11-26 济南浪潮数据技术有限公司 Phoenix index data asynchronous updating method and device
CN111159185A (en) * 2019-12-27 2020-05-15 紫光云(南京)数字技术有限公司 Hive index method based on conditional push-down elastic search
CN111753045A (en) * 2020-07-01 2020-10-09 浪潮云信息技术股份公司 Hive secondary full-text index technical method and system based on elastic search
CN111753045B (en) * 2020-07-01 2024-09-10 浪潮云信息技术股份公司 Hive two-level full-text index technical method and system based on elastic search
CN112597191A (en) * 2020-12-29 2021-04-02 拉卡拉支付股份有限公司 Data processing method, data processing apparatus, electronic device, storage medium, and program product
CN112597191B (en) * 2020-12-29 2024-06-11 拉卡拉支付股份有限公司 Data processing method, device, electronic equipment, storage medium and program product
CN114372064A (en) * 2022-03-22 2022-04-19 飞狐信息技术(天津)有限公司 Data processing apparatus, method, computer readable medium and processor
CN114372064B (en) * 2022-03-22 2022-07-12 飞狐信息技术(天津)有限公司 Data processing apparatus, method, computer readable medium and processor
WO2024022180A1 (en) * 2022-07-28 2024-02-01 天津联想协同科技有限公司 Network disk document indexing method and apparatus, and network disk and storage medium

Similar Documents

Publication Publication Date Title
CN107506464A (en) A kind of method that HBase secondary indexs are realized based on ES
CN109299102B (en) HBase secondary index system and method based on Elastcissearch
US11816126B2 (en) Large scale unstructured database systems
US9396018B2 (en) Low latency architecture with directory service for integration of transactional data system with analytical data structures
US10783124B2 (en) Data migration in a networked computer environment
Zhao et al. Modeling MongoDB with relational model
US7577637B2 (en) Communication optimization for parallel execution of user-defined table functions
US10565199B2 (en) Massively parallel processing database middleware connector
US9923901B2 (en) Integration user for analytical access to read only data stores generated from transactional systems
JP6964384B2 (en) Methods, programs, and systems for the automatic discovery of relationships between fields in a mixed heterogeneous data source environment.
US20160103914A1 (en) Offloading search processing against analytic data stores
US20140236889A1 (en) Site-based search affinity
CN110032604A (en) Data storage device, transfer device and data bank access method
CN106030573A (en) Implementation of semi-structured data as a first-class database element
CN106294695A (en) A kind of implementation method towards the biggest data search engine
EP2680151A1 (en) Distributed data base system and data structure for distributed data base
WO2020077027A1 (en) Method and system for executing queries on indexed views
US20190057133A1 (en) Systems and methods of bounded scans on multi-column keys of a database
CN111221791A (en) Method for importing multi-source heterogeneous data into data lake
CN106503243A (en) Electric power big data querying method and system based on HBase secondary indexs
Borkar et al. Have your data and query it too: From key-value caching to big data management
CN103646051A (en) Big-data parallel processing system and method based on column storage
CN105069151A (en) HBase secondary index construction apparatus and method
WO2024001493A1 (en) Visual data analysis method and device
Mehmood et al. Distributed real-time ETL architecture for unstructured big data

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
WW01 Invention patent application withdrawn after publication
WW01 Invention patent application withdrawn after publication

Application publication date: 20171222