CN108509437B - ElasticSearch query acceleration method - Google Patents

ElasticSearch query acceleration method Download PDF

Info

Publication number
CN108509437B
CN108509437B CN201710102541.9A CN201710102541A CN108509437B CN 108509437 B CN108509437 B CN 108509437B CN 201710102541 A CN201710102541 A CN 201710102541A CN 108509437 B CN108509437 B CN 108509437B
Authority
CN
China
Prior art keywords
cluster
shard
data
index
storage
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201710102541.9A
Other languages
Chinese (zh)
Other versions
CN108509437A (en
Inventor
王磊
王胤然
徐寅
穆宁
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nanjing Fiberhome Telecommunication Technologies Co ltd
Original Assignee
Nanjing Fiberhome Telecommunication Technologies Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nanjing Fiberhome Telecommunication Technologies Co ltd filed Critical Nanjing Fiberhome Telecommunication Technologies Co ltd
Priority to CN201710102541.9A priority Critical patent/CN108509437B/en
Publication of CN108509437A publication Critical patent/CN108509437A/en
Application granted granted Critical
Publication of CN108509437B publication Critical patent/CN108509437B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses an elastic search query acceleration method, which belongs to the technical field of computer big data indexing and shows that a Payload load domain is added to each field, then filtering operation is carried out on the basis of a single sub-query condition through the Payload load domain, so that the problem that the calculation of intersection and union takes a large amount of time if the data quantity of each result set is large during ES original data query is solved, and the indexing efficiency is improved.

Description

ElasticSearch query acceleration method
Technical Field
The invention belongs to the technical field of computer big data indexing.
Background
Nowadays, an era of mass production, sharing and application of data is being opened, data is rapidly expanding and becoming large, and human beings have entered the internet era. In particular social networks, e-commerce and mobile communications bring human beings into a new era of massive structured and unstructured data information. The enormous amount of data results in a high complexity of these massive amounts of data, which are full of variations and very complex to process. How to analyze and process mass data and provide simple and convenient services to the outside becomes a problem that many IT enterprises and institutions must face.
The mass data is divided into structured data and unstructured data, the structured data refers to data such as enterprise financial accounts and production data, student score data, statistical report data and the like, and the unstructured data refers to multimedia data such as text data, images, sounds and the like. Wherein the unstructured data accounts for about 80% of the mass data. Structured data can be processed through a traditional relational database and a later-developed distributed No-SQL database, and unstructured data can provide query services through a full-text retrieval technology.
In the current full-text retrieval, the Lucene is the simplest and most convenient, and is a full-text information retrieval toolkit and uses an inverted file index structure. It is not a complete search application, but provides indexing and search functionality for your application. The full-text indexing/retrieval function for the application can be conveniently embedded into various applications. Currently, the cluster technology based on Lucene mainly includes Solr and an Elasticsearch (hereinafter, abbreviated as ES), and the Elasticsearch is a search server based on Lucene. The distributed multi-user full-text search engine supports RESTful web and java interfaces, can support real-time search, and has the characteristics of stability, reliability, quickness, convenience in installation and use and the like.
The ES original data query is to subdivide the combined condition into sub-conditions to issue the query, and then perform intersection or union operations on each result set, and at this time, if the data volume of each result set is large, the intersection and union operations will take a lot of time.
Disclosure of Invention
The invention aims to provide an elastic search query acceleration method, which solves the problem that in the process of ES original data query, if the data volume of each result set is large, the calculation of intersection and union takes a large amount of time, and improves the index efficiency.
In order to achieve the purpose, the invention adopts the following technical scheme:
an ElasticSearch query acceleration method comprises the following steps:
step 1: establishing a full-text index system, wherein the full-text index system comprises a Hadoop storage server cluster, a WEB interface server, a data import server and a data acquisition terminal, the data acquisition terminal is connected with the data import server through the Internet, and the WEB interface server and the data import server are both connected with the Hadoop storage server cluster through the Internet;
step 2: establishing a full-text retrieval platform in the Hadoop storage server cluster through a Lucene full-text information retrieval tool, and distributing an ES cluster in the Hadoop storage server cluster through the Lucene full-text information retrieval tool;
and step 3: the data acquisition terminal inputs stream data or text data into the data import server, and the data import server pours the stream data or the text data into the server and sends the stream data or the text data to the Hadoop storage server cluster for storage;
and 4, step 4: the ES cluster establishes an index data table of an inverted file index structure for data stored by the Hadoop storage server cluster through a Lucene full-text information retrieval tool, and provides a field area for storage for the index data table; the field area for storage comprises a plurality of document number storage field areas;
and 5: according to a bottom-layer storage structure provided by a Lucene full-text information retrieval tool, adding a plurality of Payload load domains in an inverted list linked list by an ES cluster, wherein all the Payload load domains are arranged behind a document number storage field area;
step 6: a user inputs a query condition through a WEB interface server, and the WEB interface server transmits the query condition to the ES cluster; the query conditions comprise an accurate query condition, a range query condition, a prefix query condition and a Payload range query condition;
and 7: the ES cluster firstly carries out retrieval according to the accurate query condition, the range query condition and the prefix query condition through a Lucene full-text information retrieval tool to correspondingly obtain an accurate query result, a range query result and a prefix query result;
and 8: the ES cluster respectively filters the accurate query result, the range query result and the prefix query result according to the Payload range query condition to obtain an accurate query result set, a range query result set and a prefix query result set;
and step 9: and the ES cluster performs intersection calculation on the accurate query result set, the range query result set and the prefix query result set to obtain a final retrieval result.
The ES cluster is an Elasticissearch server cluster.
The Payload field is a storage area that stores a scope query field, which includes a time field.
In step 4, the ES cluster provides a field area for storage for the index data table according to the following steps:
step S1: setting the fragments as a basic storage unit of each index data table, wherein each index data table comprises a plurality of fragments, and the ES cluster distributes and stores the index data tables into different storage media in the ES cluster according to the fragments of the index data table;
step S2: setting an index form as an index data table in the ES cluster, wherein the shard is a fragment of the index form; the index form comprises a plurality of shards; setting a fragmentation threshold value;
step S3: the ES cluster establishes an extended index form for the index form, reads the largest shard in the index form, and judges whether the shard reaches a shard threshold value: if yes, go to step S4, otherwise go to step S5;
the specific steps of establishing an extended index form for the index form by the ES cluster are as follows:
step A: the ES cluster acquires the index form, traverses each shard in the index form, and judges the following: if the shard exceeds the shard threshold value, executing the step C; if the shard does not exceed the shard threshold, executing the step B;
and B: inquiring whether the fragments of the expansion table under the shard exceed the fragment threshold value: if yes, executing step C; otherwise, go to step S4;
and C: the ES cluster calculates the number of the shards exceeding the shard threshold according to the size of the shard threshold, and checks whether the extended index form exists or whether the shards of the extended index form are full: if the new extended index form does not exist or the shard is full, continuing to extend the new extended index form, wherein the number of the shard is twice of the number of the existing shard, and updating the information of the newly added extended index form into the routing table; if the shard lists exceed the shard threshold, listing all the shard lists exceeding the shard threshold, and adding the shard lists into a task queue of the Zookeeper after descending the order; generating a plurality of job tasks by a task queue of the Zookeeper according to the shard list;
step S4: splitting the shard according to the following steps:
step D: after the ES cluster acquires a job task from the task queue of the Zookeeper, the ES cluster informs the Ares warehousing program to stop warehousing the list, and judges whether the Ares warehousing program returns a message: if yes, executing step E; if not, waiting for the response of the Ares warehousing program;
step E: the ES cluster starts to split the shards according to the following rules:
step E1: the ES cluster acquires the storage size of the shard;
step E2: dividing the storage size by 2 to obtain a fragmentation calculation result, and comparing the fragmentation calculation result with a fragmentation threshold value: if the number of times of dividing the storage size by 2 is greater than the slicing threshold, recording the number N of times of dividing the storage size by 2, and executing the step E2; if the number of the split pieces is smaller than the splitting threshold value, recording 2 multiplied by N as the number of the split pieces;
step E3: acquiring the total data amount total of the shard, wherein the divided data amount K: k total ÷ (2 × N);
step E4: giving a time T through an ES cluster query interface, wherein the unit of T is second, when data acquired in T seconds is recorded as m, the coefficient value is s, and the size of s is equal to K/m; the ES cluster splits the shards according to the split number, the data size K and the coefficient s;
step F: the ES cluster numbers the new shards after the shards are split, and the numbers of the new shards are set as the shards [0 ];
step G: deleting data in the shard, replacing the data in the shard [0] shard with the data in the shard, and adding the shard [0] shard of the information into the index form; writing the fragments except the (0) fragment in the shards into an NFS shared directory, expanding the fragments of the index form, performing recovery on the fragments of the index form according to the fragments in the NFS shared directory, and adding the shards exceeding the fragment threshold into a Zookeeper task queue according to the method in the step C after descending the order;
step H: recording the flow track of the splitting operation, updating the flow track into a routing table, generating a new routing rule by the routing table according to the new flow track, and warehousing or inquiring data by the ES cluster according to the new routing rule;
step S5: and (4) ending the slicing expansion, and repeatedly executing the step S1 to the step S4 until the ES cluster provides field sections for storage for all index forms.
The ElasticSearch query acceleration method solves the problem that a large amount of time is occupied by calculation of intersection and union if the data volume of each result set is large during ES original data query, and improves the index efficiency; the invention realizes the efficient operation of filtering on the basis of a single sub-condition and improves the concurrent query efficiency.
Drawings
FIG. 1 is a general flow diagram of the present invention;
FIG. 2 is a flow chart of step 4 of the present invention;
FIG. 3 is a flowchart of step S3 of the present invention;
fig. 4 is a flowchart of step S4 of the present invention.
Detailed Description
As shown in fig. 1 to 4, an ElasticSearch query acceleration method includes the following steps:
step 1: establishing a full-text index system, wherein the full-text index system comprises a Hadoop storage server cluster, a WEB interface server, a data import server and a data acquisition terminal, the data acquisition terminal is connected with the data import server through the Internet, and the WEB interface server and the data import server are both connected with the Hadoop storage server cluster through the Internet;
step 2: establishing a full-text retrieval platform in the Hadoop storage server cluster through a Lucene full-text information retrieval tool, and distributing an ES cluster in the Hadoop storage server cluster through the Lucene full-text information retrieval tool;
and step 3: the data acquisition terminal inputs stream data or text data into the data import server, and the data import server pours the stream data or the text data into the server and sends the stream data or the text data to the Hadoop storage server cluster for storage;
and 4, step 4: the ES cluster establishes an index data table of an inverted file index structure for data stored by the Hadoop storage server cluster through a Lucene full-text information retrieval tool, and provides a field area for storage for the index data table; the field area for storage comprises a plurality of document number storage field areas;
and 5: according to a bottom-layer storage structure provided by a Lucene full-text information retrieval tool, adding a plurality of Payload fields in an inverted list chain table by an ES cluster, wherein all the Payload fields are arranged behind a document number storage field region;
step 6: a user inputs a query condition through a WEB interface server, and the WEB interface server transmits the query condition to the ES cluster; the query conditions comprise an accurate query condition, a range query condition, a prefix query condition and a Payload range query condition;
and 7: the ES cluster firstly carries out retrieval according to the accurate query condition, the range query condition and the prefix query condition through a Lucene full-text information retrieval tool to correspondingly obtain an accurate query result, a range query result and a prefix query result;
and 8: the ES cluster respectively filters the accurate query result, the range query result and the prefix query result according to the Payload range query condition to obtain an accurate query result set, a range query result set and a prefix query result set;
and step 9: and the ES cluster performs intersection calculation on the accurate query result set, the range query result set and the prefix query result set to obtain a final retrieval result.
The ES cluster is an Elasticissearch server cluster.
The Payload field is a storage area that stores a scope query field, which includes a time field.
The ES fragment extension adopts a Master-Slave structure, generates a plurality of jobs through a fragment list of a data table (Index) depending on zookeeper, and each of the fragmentation modules schedules the jobs, executes the jobs and completes the operation of the fragmentation (Shard).
In step 4, the ES cluster provides a field area for storage for the index data table according to the following steps:
step S1: setting the fragments as a basic storage unit of each index data table, wherein each index data table comprises a plurality of fragments, and the ES cluster distributes and stores the index data tables into different storage media in the ES cluster according to the fragments of the index data table;
step S2: setting an index form as an index data table in the ES cluster, wherein the shard is a fragment of the index form; the index form comprises a plurality of shards; setting a fragmentation threshold value; the ES cluster establishes an extended index form for the index form on the premise that the index form has an alias, the established extended index form has the same alias, and the number of shards of the extended index form is the same as that of the index form;
step S3: the ES cluster establishes an extended index form for the index form, reads the largest shard in the index form, and judges whether the shard reaches a shard threshold value: if yes, go to step S4, otherwise go to step S5;
the specific steps of establishing an extended index form for the index form by the ES cluster are as follows:
step A: the ES cluster acquires the index form, traverses each shard in the index form, and judges the following: if the shard exceeds the shard threshold value, executing the step C; if the shard does not exceed the shard threshold, executing the step B;
and B: inquiring whether the fragments of the expansion table under the shard exceed the fragment threshold value: if yes, executing step C; otherwise, go to step S4;
and C: the ES cluster calculates the number of the shards exceeding the shard threshold according to the size of the shard threshold, and checks whether the extended index form exists or whether the shards of the extended index form are full: if the new extended index form does not exist or the shard is full, continuing to extend the new extended index form, wherein the number of the shard is twice of the number of the existing shard, and updating the information of the newly added extended index form into the routing table; if the shard lists exceed the shard threshold, listing all the shard lists exceeding the shard threshold, and adding the shard lists into a task queue of the Zookeeper after descending the order; generating a plurality of job tasks by a task queue of the Zookeeper according to the shard list; ZooKeeper is a distributed, open-source distributed application coordination service, is an open-source implementation of Chubby of Google, and is an important component of Hadoop and Hbase.
Step S4: splitting the shard according to the following steps:
step D: after the ES cluster acquires a job task from the task queue of the Zookeeper, the ES cluster informs the Ares warehousing program to stop warehousing the list, and judges whether the Ares warehousing program returns a message: if yes, executing step E; if not, waiting for the response of the Ares warehousing program;
step E: the ES cluster starts to split the shards according to the following rules:
step E1: the ES cluster acquires the storage size of the shard;
step E2: dividing the storage size by 2 to obtain a fragmentation calculation result, and comparing the fragmentation calculation result with a fragmentation threshold value: if the number of times of dividing the storage size by 2 is greater than the slicing threshold, recording the number N of times of dividing the storage size by 2, and executing the step E2; if the number of the split pieces is smaller than the splitting threshold value, recording 2 multiplied by N as the number of the split pieces;
step E3: acquiring the total data amount total of the shard, wherein the divided data amount K: k total ÷ (2 × N);
step E4: giving a time T through an ES cluster query interface, wherein the unit of T is second, when data acquired in T seconds is recorded as m, the coefficient value is s, and the size of s is equal to K/m; the ES cluster splits the shards according to the split number, the data size K and the coefficient s;
step F: the ES cluster numbers the new shards after the shards are split, and the numbers of the new shards are set as the shards [0 ];
step G: deleting data in the shard, replacing the data in the shard [0] shard with the data in the shard, and adding the shard [0] shard of the information into the index form; writing the fragments except the (0) fragment in the shards into an NFS shared directory, expanding the fragments of the index form, performing recovery on the fragments of the index form according to the fragments in the NFS shared directory, and adding the shards exceeding the fragment threshold into a Zookeeper task queue according to the method in the step C after descending the order;
step H: recording the flow track of the splitting operation, updating the flow track into a routing table, generating a new routing rule by the routing table according to the new flow track, and warehousing or inquiring data by the ES cluster according to the new routing rule;
step S5: and (4) ending the slicing expansion, and repeatedly executing the step S1 to the step S4 until the ES cluster provides field sections for storage for all index forms.
When the method is used, as shown in fig. 1, a Payload field is added to each field, and a data query mode is changed correspondingly, for example, a query condition AandCandDandB is provided, where a is an accurate query condition, C is a range query condition, D is a prefix query condition, and B is a Payload range query condition. According to the method provided by the invention, the sub-conditions A, C, D are issued, respective query results are respectively found out, after the results are respectively queried, the results are filtered through Payload conditions B, the data volume of each sub-result set is reduced, and finally the intersection of the filtered results of three batches is taken to obtain the final result.
A series of interfaces supporting Payload domain query are added in a Lucene full-text information retrieval tool (Lucene for short) and an ES cluster, so that a user can directly call the Payload interface of the ES like calling other Elasticisarch interfaces, a Payload domain storage structure and an interface of a bottom Lucene do not need to be sensed, and the Payload domain is effectively utilized; and performing Payload encapsulation on five query conditions, namely 'single condition equivalent + range', 'prefix condition + range', 'fuzzy condition + range', 'IN condition + range' and 'range + range'.
The data acquisition terminal is a ten-gigabit switch, the ten-gigabit switch can acquire a large number of data sources from the Internet, and the data sources are in the format of data files and streaming data;
the ES cluster provides a data storage, query analysis and management monitoring interface for the Hadoop storage server cluster, a storage medium is a local disk, and the ES cluster supports various Spark components; the WEB interface server is in butt joint with the ES cluster through Zues-client and Loki;
zues-client is the encapsulated ES interface for the upper layer to call; loki is the query middleware of the unified index, and is responsible for receiving the query requests of structured data, unstructured data and mixed data of an upper layer user, analyzing, segmenting and forwarding the requests to the ES, and acquiring data from the structured data system and the unstructured data system according to the returned data id.
The ElasticSearch query acceleration method solves the problem that a large amount of time is occupied by calculation of intersection and union if the data volume of each result set is large during ES original data query, and improves the index efficiency; the invention realizes the efficient operation of filtering on the basis of a single sub-condition and improves the concurrent query efficiency.

Claims (4)

1. An ElasticSearch query acceleration method is characterized by comprising the following steps: the method comprises the following steps:
step 1: establishing a full-text index system, wherein the full-text index system comprises a Hadoop storage server cluster, a WEB interface server, a data import server and a data acquisition terminal, the data acquisition terminal is connected with the data import server through the Internet, and the WEB interface server and the data import server are both connected with the Hadoop storage server cluster through the Internet;
step 2: establishing a full-text retrieval platform in the Hadoop storage server cluster through a Lucene full-text information retrieval tool, and distributing an ES cluster in the Hadoop storage server cluster through the Lucene full-text information retrieval tool;
and step 3: the data acquisition terminal inputs stream data or text data into the data import server, and the data import server pours the stream data or the text data into the server and sends the stream data or the text data to the Hadoop storage server cluster for storage;
and 4, step 4: the ES cluster establishes an index data table of an inverted file index structure for data stored by the Hadoop storage server cluster through a Lucene full-text information retrieval tool, and provides a field area for storage for the index data table; the field area for storage comprises a plurality of document number storage field areas;
and 5: according to a bottom-layer storage structure provided by a Lucene full-text information retrieval tool, adding a plurality of Payload load domains in an inverted list linked list by an ES cluster, wherein all the Payload load domains are arranged behind a document number storage field area;
step 6: a user inputs a query condition through a WEB interface server, and the WEB interface server transmits the query condition to the ES cluster; the query conditions comprise an accurate query condition, a range query condition, a prefix query condition and a Payload range query condition;
and 7: the ES cluster firstly carries out retrieval according to the accurate query condition, the range query condition and the prefix query condition through a Lucene full-text information retrieval tool to correspondingly obtain an accurate query result, a range query result and a prefix query result;
and 8: the ES cluster respectively filters the accurate query result, the range query result and the prefix query result according to the Payload range query condition to obtain an accurate query result set, a range query result set and a prefix query result set;
and step 9: and the ES cluster performs intersection calculation on the accurate query result set, the range query result set and the prefix query result set to obtain a final retrieval result.
2. The method of claim 1, wherein the method comprises: the ES cluster is an Elasticissearch server cluster.
3. The method of claim 1, wherein the method comprises: the Payload field is a storage area that stores a scope query field, which includes a time field.
4. The method of claim 1, wherein the method comprises: in step 4, the ES cluster provides a field area for storage for the index data table according to the following steps:
step S1: setting the fragments as a basic storage unit of each index data table, wherein each index data table comprises a plurality of fragments, and the ES cluster distributes and stores the index data tables into different storage media in the ES cluster according to the fragments of the index data table;
step S2: setting an index form as an index data table in the ES cluster, wherein the shard is a fragment of the index form; the index form comprises a plurality of shards; setting a fragmentation threshold value;
step S3: the ES cluster establishes an extended index form for the index form, reads the largest shard in the index form, and judges whether the shard reaches a shard threshold value: if yes, go to step S4, otherwise go to step S5;
the specific steps of establishing an extended index form for the index form by the ES cluster are as follows:
step A: the ES cluster acquires the index form, traverses each shard in the index form, and judges the following: if the shard exceeds the shard threshold value, executing the step C; if the shard does not exceed the shard threshold, executing the step B;
and B: inquiring whether the fragments of the expansion table under the shard exceed the fragment threshold value: if yes, executing step C; otherwise, go to step S4;
and C: the ES cluster calculates the number of the shards exceeding the shard threshold according to the size of the shard threshold, and checks whether the extended index form exists or whether the shards of the extended index form are full: if the new extended index form does not exist or the shard is full, continuing to extend the new extended index form, wherein the number of the shard is twice of the number of the existing shard, and updating the information of the newly added extended index form into the routing table; if the shard lists exceed the shard threshold, listing all the shard lists exceeding the shard threshold, and adding the shard lists into a task queue of the Zookeeper after descending the order; generating a plurality of job tasks by a task queue of the Zookeeper according to the shard list;
step S4: splitting the shard according to the following steps:
step D: after the ES cluster acquires a job task from the task queue of the Zookeeper, the ES cluster informs the Ares warehousing program to stop warehousing the list, and judges whether the Ares warehousing program returns a message: if yes, executing step E; if not, waiting for the response of the Ares warehousing program;
step E: the ES cluster starts to split the shards according to the following rules:
step E1: the ES cluster acquires the storage size of the shard;
step E2: dividing the storage size by 2 to obtain a fragmentation calculation result, and comparing the fragmentation calculation result with a fragmentation threshold value: if the number of times of dividing the storage size by 2 is greater than the slicing threshold, recording the number N of times of dividing the storage size by 2, and executing the step E2; if the number of the split pieces is smaller than the splitting threshold value, recording 2 multiplied by N as the number of the split pieces;
step E3: acquiring the total data amount total of the shard, wherein the divided data amount K: k total ÷ (2 × N);
step E4: giving a time T through an ES cluster query interface, wherein the unit of T is second, when data acquired in T seconds is recorded as m, the coefficient value is s, and the size of s is equal to K/m; the ES cluster splits the shards according to the split number, the data size K and the coefficient s;
step F: the ES cluster numbers the new shards after the shards are split, and the numbers of the new shards are set as the shards [0 ];
step G: deleting data in the shard, replacing the data in the shard [0] shard with the data in the shard, and adding the shard [0] shard of the information into the index form; writing the fragments except the (0) fragment in the shards into an NFS shared directory, expanding the fragments of the index form, performing recovery on the fragments of the index form according to the fragments in the NFS shared directory, and adding the shards exceeding the fragment threshold into a Zookeeper task queue according to the method in the step C after descending the order;
step H: recording the flow track of the splitting operation, updating the flow track into a routing table, generating a new routing rule by the routing table according to the new flow track, and warehousing or inquiring data by the ES cluster according to the new routing rule;
step S5: and (4) ending the slicing expansion, and repeatedly executing the step S1 to the step S4 until the ES cluster provides field sections for storage for all index forms.
CN201710102541.9A 2017-02-24 2017-02-24 ElasticSearch query acceleration method Active CN108509437B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710102541.9A CN108509437B (en) 2017-02-24 2017-02-24 ElasticSearch query acceleration method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710102541.9A CN108509437B (en) 2017-02-24 2017-02-24 ElasticSearch query acceleration method

Publications (2)

Publication Number Publication Date
CN108509437A CN108509437A (en) 2018-09-07
CN108509437B true CN108509437B (en) 2021-09-17

Family

ID=63373643

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710102541.9A Active CN108509437B (en) 2017-02-24 2017-02-24 ElasticSearch query acceleration method

Country Status (1)

Country Link
CN (1) CN108509437B (en)

Families Citing this family (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110909186B (en) * 2018-09-14 2023-08-22 中国科学院上海高等研究院 Hyperspectral remote sensing data storage and retrieval method and system, storage medium and terminal
CN109542930A (en) * 2018-11-16 2019-03-29 重庆邮电大学 A kind of data efficient search method based on ElasticSearch
CN109885642B (en) * 2019-02-18 2021-11-02 国家计算机网络与信息安全管理中心 Hierarchical storage method and device for full-text retrieval
CN109885536B (en) * 2019-02-26 2023-06-16 深圳众享互联科技有限公司 Distributed data fragment storage and fuzzy search method
CN110909737A (en) * 2019-11-14 2020-03-24 武汉虹旭信息技术有限责任公司 Picture character recognition method and system
CN111538747B (en) * 2020-05-27 2023-04-14 支付宝(杭州)信息技术有限公司 Data query method, device and equipment and auxiliary data query method, device and equipment
CN111698239A (en) * 2020-06-08 2020-09-22 星辰天合(北京)数据科技有限公司 Application control method, device and system based on network file system
CN112181993A (en) * 2020-10-27 2021-01-05 广州市网星信息技术有限公司 Service data query method, device, server and storage medium
CN112364189B (en) * 2020-11-16 2023-02-21 浪潮云信息技术股份公司 Electronic certificate application method based on ES service
CN113360706A (en) * 2021-06-20 2021-09-07 杭州登虹科技有限公司 Video Timeline storage method based on object storage and elastic search

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP5119550B2 (en) * 2007-12-28 2013-01-16 株式会社メガチップス Data processing system and data processing method
CN103324642A (en) * 2012-03-23 2013-09-25 日电(中国)有限公司 Data index establishing system and method as well as data query method
CN103605704A (en) * 2013-11-08 2014-02-26 深圳大学 Mass url (uniform resource locator) data any field indexing and retrieving method
CN105189314A (en) * 2013-03-15 2015-12-23 西姆伯蒂克有限责任公司 Automated storage and retrieval system
CN106202207A (en) * 2016-06-28 2016-12-07 中国电子科技集团公司第二十八研究所 A kind of index based on HBase ORM and searching system
CN106446273A (en) * 2016-10-21 2017-02-22 天津海量信息技术股份有限公司 ES (Elastic Search) global data deduplication method based on rpc

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20070061294A1 (en) * 2005-09-09 2007-03-15 Microsoft Corporation Source code file search

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP5119550B2 (en) * 2007-12-28 2013-01-16 株式会社メガチップス Data processing system and data processing method
CN103324642A (en) * 2012-03-23 2013-09-25 日电(中国)有限公司 Data index establishing system and method as well as data query method
CN105189314A (en) * 2013-03-15 2015-12-23 西姆伯蒂克有限责任公司 Automated storage and retrieval system
CN103605704A (en) * 2013-11-08 2014-02-26 深圳大学 Mass url (uniform resource locator) data any field indexing and retrieving method
CN106202207A (en) * 2016-06-28 2016-12-07 中国电子科技集团公司第二十八研究所 A kind of index based on HBase ORM and searching system
CN106446273A (en) * 2016-10-21 2017-02-22 天津海量信息技术股份有限公司 ES (Elastic Search) global data deduplication method based on rpc

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
ElasticSearch分布式搜索引擎在天文大数据检索中的应用研究;陈亚杰等;《天文学报》;20160311;全文 *
基于HBase+ ElasticSearch的海量交通数据实时存取方案设计;董长青等;《大数据》;20170120;全文 *

Also Published As

Publication number Publication date
CN108509437A (en) 2018-09-07

Similar Documents

Publication Publication Date Title
CN108509437B (en) ElasticSearch query acceleration method
US11423041B2 (en) Maintaining data lineage to detect data events
CN110362544B (en) Log processing system, log processing method, terminal and storage medium
CN109614402B (en) Multidimensional data query method and device
CN111258978B (en) Data storage method
CN104424229A (en) Calculating method and system for multi-dimensional division
CN106156088B (en) Index data processing method, data query method and device
CN109063196A (en) Data processing method and device, electronic equipment and computer readable storage medium
WO2017174013A1 (en) Data storage management method and apparatus, and data storage system
CN111414361A (en) Label data storage method, device, equipment and readable storage medium
CN113051460A (en) Elasticissearch-based data retrieval method and system, electronic device and storage medium
CN116166191A (en) Integrated system of lake and storehouse
CN108509438B (en) ElasticSearch fragment expansion method
CN104063456B (en) Based on vector query from broadcasting media atlas analysis method and apparatus
Hurst et al. Social streams blog crawler
CN111104408A (en) Data exchange method and device based on map data and storage medium
CN102521379A (en) Internet information collection method and internet information collection device based on active push technology
CN111723063A (en) Method and device for processing offline log data
WO2023028517A1 (en) Updating records in a real-time storage system
EL-SAYED et al. Impact of small files on hadoop performance: literature survey and open points
CN113656469B (en) Big data processing method and device
CN111061719B (en) Data collection method, device, equipment and storage medium
CN114741467A (en) Full-text retrieval method and system
US10185729B2 (en) Index creation method and system
CN111427910A (en) Data processing method and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant