CN108509437B - ElasticSearch query acceleration method - Google Patents
ElasticSearch query acceleration method Download PDFInfo
- Publication number
- CN108509437B CN108509437B CN201710102541.9A CN201710102541A CN108509437B CN 108509437 B CN108509437 B CN 108509437B CN 201710102541 A CN201710102541 A CN 201710102541A CN 108509437 B CN108509437 B CN 108509437B
- Authority
- CN
- China
- Prior art keywords
- cluster
- shard
- data
- index
- storage
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Images
Landscapes
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention discloses an elastic search query acceleration method, which belongs to the technical field of computer big data indexing and shows that a Payload load domain is added to each field, then filtering operation is carried out on the basis of a single sub-query condition through the Payload load domain, so that the problem that the calculation of intersection and union takes a large amount of time if the data quantity of each result set is large during ES original data query is solved, and the indexing efficiency is improved.
Description
Technical Field
The invention belongs to the technical field of computer big data indexing.
Background
Nowadays, an era of mass production, sharing and application of data is being opened, data is rapidly expanding and becoming large, and human beings have entered the internet era. In particular social networks, e-commerce and mobile communications bring human beings into a new era of massive structured and unstructured data information. The enormous amount of data results in a high complexity of these massive amounts of data, which are full of variations and very complex to process. How to analyze and process mass data and provide simple and convenient services to the outside becomes a problem that many IT enterprises and institutions must face.
The mass data is divided into structured data and unstructured data, the structured data refers to data such as enterprise financial accounts and production data, student score data, statistical report data and the like, and the unstructured data refers to multimedia data such as text data, images, sounds and the like. Wherein the unstructured data accounts for about 80% of the mass data. Structured data can be processed through a traditional relational database and a later-developed distributed No-SQL database, and unstructured data can provide query services through a full-text retrieval technology.
In the current full-text retrieval, the Lucene is the simplest and most convenient, and is a full-text information retrieval toolkit and uses an inverted file index structure. It is not a complete search application, but provides indexing and search functionality for your application. The full-text indexing/retrieval function for the application can be conveniently embedded into various applications. Currently, the cluster technology based on Lucene mainly includes Solr and an Elasticsearch (hereinafter, abbreviated as ES), and the Elasticsearch is a search server based on Lucene. The distributed multi-user full-text search engine supports RESTful web and java interfaces, can support real-time search, and has the characteristics of stability, reliability, quickness, convenience in installation and use and the like.
The ES original data query is to subdivide the combined condition into sub-conditions to issue the query, and then perform intersection or union operations on each result set, and at this time, if the data volume of each result set is large, the intersection and union operations will take a lot of time.
Disclosure of Invention
The invention aims to provide an elastic search query acceleration method, which solves the problem that in the process of ES original data query, if the data volume of each result set is large, the calculation of intersection and union takes a large amount of time, and improves the index efficiency.
In order to achieve the purpose, the invention adopts the following technical scheme:
an ElasticSearch query acceleration method comprises the following steps:
step 1: establishing a full-text index system, wherein the full-text index system comprises a Hadoop storage server cluster, a WEB interface server, a data import server and a data acquisition terminal, the data acquisition terminal is connected with the data import server through the Internet, and the WEB interface server and the data import server are both connected with the Hadoop storage server cluster through the Internet;
step 2: establishing a full-text retrieval platform in the Hadoop storage server cluster through a Lucene full-text information retrieval tool, and distributing an ES cluster in the Hadoop storage server cluster through the Lucene full-text information retrieval tool;
and step 3: the data acquisition terminal inputs stream data or text data into the data import server, and the data import server pours the stream data or the text data into the server and sends the stream data or the text data to the Hadoop storage server cluster for storage;
and 4, step 4: the ES cluster establishes an index data table of an inverted file index structure for data stored by the Hadoop storage server cluster through a Lucene full-text information retrieval tool, and provides a field area for storage for the index data table; the field area for storage comprises a plurality of document number storage field areas;
and 5: according to a bottom-layer storage structure provided by a Lucene full-text information retrieval tool, adding a plurality of Payload load domains in an inverted list linked list by an ES cluster, wherein all the Payload load domains are arranged behind a document number storage field area;
step 6: a user inputs a query condition through a WEB interface server, and the WEB interface server transmits the query condition to the ES cluster; the query conditions comprise an accurate query condition, a range query condition, a prefix query condition and a Payload range query condition;
and 7: the ES cluster firstly carries out retrieval according to the accurate query condition, the range query condition and the prefix query condition through a Lucene full-text information retrieval tool to correspondingly obtain an accurate query result, a range query result and a prefix query result;
and 8: the ES cluster respectively filters the accurate query result, the range query result and the prefix query result according to the Payload range query condition to obtain an accurate query result set, a range query result set and a prefix query result set;
and step 9: and the ES cluster performs intersection calculation on the accurate query result set, the range query result set and the prefix query result set to obtain a final retrieval result.
The ES cluster is an Elasticissearch server cluster.
The Payload field is a storage area that stores a scope query field, which includes a time field.
In step 4, the ES cluster provides a field area for storage for the index data table according to the following steps:
step S1: setting the fragments as a basic storage unit of each index data table, wherein each index data table comprises a plurality of fragments, and the ES cluster distributes and stores the index data tables into different storage media in the ES cluster according to the fragments of the index data table;
step S2: setting an index form as an index data table in the ES cluster, wherein the shard is a fragment of the index form; the index form comprises a plurality of shards; setting a fragmentation threshold value;
step S3: the ES cluster establishes an extended index form for the index form, reads the largest shard in the index form, and judges whether the shard reaches a shard threshold value: if yes, go to step S4, otherwise go to step S5;
the specific steps of establishing an extended index form for the index form by the ES cluster are as follows:
step A: the ES cluster acquires the index form, traverses each shard in the index form, and judges the following: if the shard exceeds the shard threshold value, executing the step C; if the shard does not exceed the shard threshold, executing the step B;
and B: inquiring whether the fragments of the expansion table under the shard exceed the fragment threshold value: if yes, executing step C; otherwise, go to step S4;
and C: the ES cluster calculates the number of the shards exceeding the shard threshold according to the size of the shard threshold, and checks whether the extended index form exists or whether the shards of the extended index form are full: if the new extended index form does not exist or the shard is full, continuing to extend the new extended index form, wherein the number of the shard is twice of the number of the existing shard, and updating the information of the newly added extended index form into the routing table; if the shard lists exceed the shard threshold, listing all the shard lists exceeding the shard threshold, and adding the shard lists into a task queue of the Zookeeper after descending the order; generating a plurality of job tasks by a task queue of the Zookeeper according to the shard list;
step S4: splitting the shard according to the following steps:
step D: after the ES cluster acquires a job task from the task queue of the Zookeeper, the ES cluster informs the Ares warehousing program to stop warehousing the list, and judges whether the Ares warehousing program returns a message: if yes, executing step E; if not, waiting for the response of the Ares warehousing program;
step E: the ES cluster starts to split the shards according to the following rules:
step E1: the ES cluster acquires the storage size of the shard;
step E2: dividing the storage size by 2 to obtain a fragmentation calculation result, and comparing the fragmentation calculation result with a fragmentation threshold value: if the number of times of dividing the storage size by 2 is greater than the slicing threshold, recording the number N of times of dividing the storage size by 2, and executing the step E2; if the number of the split pieces is smaller than the splitting threshold value, recording 2 multiplied by N as the number of the split pieces;
step E3: acquiring the total data amount total of the shard, wherein the divided data amount K: k total ÷ (2 × N);
step E4: giving a time T through an ES cluster query interface, wherein the unit of T is second, when data acquired in T seconds is recorded as m, the coefficient value is s, and the size of s is equal to K/m; the ES cluster splits the shards according to the split number, the data size K and the coefficient s;
step F: the ES cluster numbers the new shards after the shards are split, and the numbers of the new shards are set as the shards [0 ];
step G: deleting data in the shard, replacing the data in the shard [0] shard with the data in the shard, and adding the shard [0] shard of the information into the index form; writing the fragments except the (0) fragment in the shards into an NFS shared directory, expanding the fragments of the index form, performing recovery on the fragments of the index form according to the fragments in the NFS shared directory, and adding the shards exceeding the fragment threshold into a Zookeeper task queue according to the method in the step C after descending the order;
step H: recording the flow track of the splitting operation, updating the flow track into a routing table, generating a new routing rule by the routing table according to the new flow track, and warehousing or inquiring data by the ES cluster according to the new routing rule;
step S5: and (4) ending the slicing expansion, and repeatedly executing the step S1 to the step S4 until the ES cluster provides field sections for storage for all index forms.
The ElasticSearch query acceleration method solves the problem that a large amount of time is occupied by calculation of intersection and union if the data volume of each result set is large during ES original data query, and improves the index efficiency; the invention realizes the efficient operation of filtering on the basis of a single sub-condition and improves the concurrent query efficiency.
Drawings
FIG. 1 is a general flow diagram of the present invention;
FIG. 2 is a flow chart of step 4 of the present invention;
FIG. 3 is a flowchart of step S3 of the present invention;
fig. 4 is a flowchart of step S4 of the present invention.
Detailed Description
As shown in fig. 1 to 4, an ElasticSearch query acceleration method includes the following steps:
step 1: establishing a full-text index system, wherein the full-text index system comprises a Hadoop storage server cluster, a WEB interface server, a data import server and a data acquisition terminal, the data acquisition terminal is connected with the data import server through the Internet, and the WEB interface server and the data import server are both connected with the Hadoop storage server cluster through the Internet;
step 2: establishing a full-text retrieval platform in the Hadoop storage server cluster through a Lucene full-text information retrieval tool, and distributing an ES cluster in the Hadoop storage server cluster through the Lucene full-text information retrieval tool;
and step 3: the data acquisition terminal inputs stream data or text data into the data import server, and the data import server pours the stream data or the text data into the server and sends the stream data or the text data to the Hadoop storage server cluster for storage;
and 4, step 4: the ES cluster establishes an index data table of an inverted file index structure for data stored by the Hadoop storage server cluster through a Lucene full-text information retrieval tool, and provides a field area for storage for the index data table; the field area for storage comprises a plurality of document number storage field areas;
and 5: according to a bottom-layer storage structure provided by a Lucene full-text information retrieval tool, adding a plurality of Payload fields in an inverted list chain table by an ES cluster, wherein all the Payload fields are arranged behind a document number storage field region;
step 6: a user inputs a query condition through a WEB interface server, and the WEB interface server transmits the query condition to the ES cluster; the query conditions comprise an accurate query condition, a range query condition, a prefix query condition and a Payload range query condition;
and 7: the ES cluster firstly carries out retrieval according to the accurate query condition, the range query condition and the prefix query condition through a Lucene full-text information retrieval tool to correspondingly obtain an accurate query result, a range query result and a prefix query result;
and 8: the ES cluster respectively filters the accurate query result, the range query result and the prefix query result according to the Payload range query condition to obtain an accurate query result set, a range query result set and a prefix query result set;
and step 9: and the ES cluster performs intersection calculation on the accurate query result set, the range query result set and the prefix query result set to obtain a final retrieval result.
The ES cluster is an Elasticissearch server cluster.
The Payload field is a storage area that stores a scope query field, which includes a time field.
The ES fragment extension adopts a Master-Slave structure, generates a plurality of jobs through a fragment list of a data table (Index) depending on zookeeper, and each of the fragmentation modules schedules the jobs, executes the jobs and completes the operation of the fragmentation (Shard).
In step 4, the ES cluster provides a field area for storage for the index data table according to the following steps:
step S1: setting the fragments as a basic storage unit of each index data table, wherein each index data table comprises a plurality of fragments, and the ES cluster distributes and stores the index data tables into different storage media in the ES cluster according to the fragments of the index data table;
step S2: setting an index form as an index data table in the ES cluster, wherein the shard is a fragment of the index form; the index form comprises a plurality of shards; setting a fragmentation threshold value; the ES cluster establishes an extended index form for the index form on the premise that the index form has an alias, the established extended index form has the same alias, and the number of shards of the extended index form is the same as that of the index form;
step S3: the ES cluster establishes an extended index form for the index form, reads the largest shard in the index form, and judges whether the shard reaches a shard threshold value: if yes, go to step S4, otherwise go to step S5;
the specific steps of establishing an extended index form for the index form by the ES cluster are as follows:
step A: the ES cluster acquires the index form, traverses each shard in the index form, and judges the following: if the shard exceeds the shard threshold value, executing the step C; if the shard does not exceed the shard threshold, executing the step B;
and B: inquiring whether the fragments of the expansion table under the shard exceed the fragment threshold value: if yes, executing step C; otherwise, go to step S4;
and C: the ES cluster calculates the number of the shards exceeding the shard threshold according to the size of the shard threshold, and checks whether the extended index form exists or whether the shards of the extended index form are full: if the new extended index form does not exist or the shard is full, continuing to extend the new extended index form, wherein the number of the shard is twice of the number of the existing shard, and updating the information of the newly added extended index form into the routing table; if the shard lists exceed the shard threshold, listing all the shard lists exceeding the shard threshold, and adding the shard lists into a task queue of the Zookeeper after descending the order; generating a plurality of job tasks by a task queue of the Zookeeper according to the shard list; ZooKeeper is a distributed, open-source distributed application coordination service, is an open-source implementation of Chubby of Google, and is an important component of Hadoop and Hbase.
Step S4: splitting the shard according to the following steps:
step D: after the ES cluster acquires a job task from the task queue of the Zookeeper, the ES cluster informs the Ares warehousing program to stop warehousing the list, and judges whether the Ares warehousing program returns a message: if yes, executing step E; if not, waiting for the response of the Ares warehousing program;
step E: the ES cluster starts to split the shards according to the following rules:
step E1: the ES cluster acquires the storage size of the shard;
step E2: dividing the storage size by 2 to obtain a fragmentation calculation result, and comparing the fragmentation calculation result with a fragmentation threshold value: if the number of times of dividing the storage size by 2 is greater than the slicing threshold, recording the number N of times of dividing the storage size by 2, and executing the step E2; if the number of the split pieces is smaller than the splitting threshold value, recording 2 multiplied by N as the number of the split pieces;
step E3: acquiring the total data amount total of the shard, wherein the divided data amount K: k total ÷ (2 × N);
step E4: giving a time T through an ES cluster query interface, wherein the unit of T is second, when data acquired in T seconds is recorded as m, the coefficient value is s, and the size of s is equal to K/m; the ES cluster splits the shards according to the split number, the data size K and the coefficient s;
step F: the ES cluster numbers the new shards after the shards are split, and the numbers of the new shards are set as the shards [0 ];
step G: deleting data in the shard, replacing the data in the shard [0] shard with the data in the shard, and adding the shard [0] shard of the information into the index form; writing the fragments except the (0) fragment in the shards into an NFS shared directory, expanding the fragments of the index form, performing recovery on the fragments of the index form according to the fragments in the NFS shared directory, and adding the shards exceeding the fragment threshold into a Zookeeper task queue according to the method in the step C after descending the order;
step H: recording the flow track of the splitting operation, updating the flow track into a routing table, generating a new routing rule by the routing table according to the new flow track, and warehousing or inquiring data by the ES cluster according to the new routing rule;
step S5: and (4) ending the slicing expansion, and repeatedly executing the step S1 to the step S4 until the ES cluster provides field sections for storage for all index forms.
When the method is used, as shown in fig. 1, a Payload field is added to each field, and a data query mode is changed correspondingly, for example, a query condition AandCandDandB is provided, where a is an accurate query condition, C is a range query condition, D is a prefix query condition, and B is a Payload range query condition. According to the method provided by the invention, the sub-conditions A, C, D are issued, respective query results are respectively found out, after the results are respectively queried, the results are filtered through Payload conditions B, the data volume of each sub-result set is reduced, and finally the intersection of the filtered results of three batches is taken to obtain the final result.
A series of interfaces supporting Payload domain query are added in a Lucene full-text information retrieval tool (Lucene for short) and an ES cluster, so that a user can directly call the Payload interface of the ES like calling other Elasticisarch interfaces, a Payload domain storage structure and an interface of a bottom Lucene do not need to be sensed, and the Payload domain is effectively utilized; and performing Payload encapsulation on five query conditions, namely 'single condition equivalent + range', 'prefix condition + range', 'fuzzy condition + range', 'IN condition + range' and 'range + range'.
The data acquisition terminal is a ten-gigabit switch, the ten-gigabit switch can acquire a large number of data sources from the Internet, and the data sources are in the format of data files and streaming data;
the ES cluster provides a data storage, query analysis and management monitoring interface for the Hadoop storage server cluster, a storage medium is a local disk, and the ES cluster supports various Spark components; the WEB interface server is in butt joint with the ES cluster through Zues-client and Loki;
zues-client is the encapsulated ES interface for the upper layer to call; loki is the query middleware of the unified index, and is responsible for receiving the query requests of structured data, unstructured data and mixed data of an upper layer user, analyzing, segmenting and forwarding the requests to the ES, and acquiring data from the structured data system and the unstructured data system according to the returned data id.
The ElasticSearch query acceleration method solves the problem that a large amount of time is occupied by calculation of intersection and union if the data volume of each result set is large during ES original data query, and improves the index efficiency; the invention realizes the efficient operation of filtering on the basis of a single sub-condition and improves the concurrent query efficiency.
Claims (4)
1. An ElasticSearch query acceleration method is characterized by comprising the following steps: the method comprises the following steps:
step 1: establishing a full-text index system, wherein the full-text index system comprises a Hadoop storage server cluster, a WEB interface server, a data import server and a data acquisition terminal, the data acquisition terminal is connected with the data import server through the Internet, and the WEB interface server and the data import server are both connected with the Hadoop storage server cluster through the Internet;
step 2: establishing a full-text retrieval platform in the Hadoop storage server cluster through a Lucene full-text information retrieval tool, and distributing an ES cluster in the Hadoop storage server cluster through the Lucene full-text information retrieval tool;
and step 3: the data acquisition terminal inputs stream data or text data into the data import server, and the data import server pours the stream data or the text data into the server and sends the stream data or the text data to the Hadoop storage server cluster for storage;
and 4, step 4: the ES cluster establishes an index data table of an inverted file index structure for data stored by the Hadoop storage server cluster through a Lucene full-text information retrieval tool, and provides a field area for storage for the index data table; the field area for storage comprises a plurality of document number storage field areas;
and 5: according to a bottom-layer storage structure provided by a Lucene full-text information retrieval tool, adding a plurality of Payload load domains in an inverted list linked list by an ES cluster, wherein all the Payload load domains are arranged behind a document number storage field area;
step 6: a user inputs a query condition through a WEB interface server, and the WEB interface server transmits the query condition to the ES cluster; the query conditions comprise an accurate query condition, a range query condition, a prefix query condition and a Payload range query condition;
and 7: the ES cluster firstly carries out retrieval according to the accurate query condition, the range query condition and the prefix query condition through a Lucene full-text information retrieval tool to correspondingly obtain an accurate query result, a range query result and a prefix query result;
and 8: the ES cluster respectively filters the accurate query result, the range query result and the prefix query result according to the Payload range query condition to obtain an accurate query result set, a range query result set and a prefix query result set;
and step 9: and the ES cluster performs intersection calculation on the accurate query result set, the range query result set and the prefix query result set to obtain a final retrieval result.
2. The method of claim 1, wherein the method comprises: the ES cluster is an Elasticissearch server cluster.
3. The method of claim 1, wherein the method comprises: the Payload field is a storage area that stores a scope query field, which includes a time field.
4. The method of claim 1, wherein the method comprises: in step 4, the ES cluster provides a field area for storage for the index data table according to the following steps:
step S1: setting the fragments as a basic storage unit of each index data table, wherein each index data table comprises a plurality of fragments, and the ES cluster distributes and stores the index data tables into different storage media in the ES cluster according to the fragments of the index data table;
step S2: setting an index form as an index data table in the ES cluster, wherein the shard is a fragment of the index form; the index form comprises a plurality of shards; setting a fragmentation threshold value;
step S3: the ES cluster establishes an extended index form for the index form, reads the largest shard in the index form, and judges whether the shard reaches a shard threshold value: if yes, go to step S4, otherwise go to step S5;
the specific steps of establishing an extended index form for the index form by the ES cluster are as follows:
step A: the ES cluster acquires the index form, traverses each shard in the index form, and judges the following: if the shard exceeds the shard threshold value, executing the step C; if the shard does not exceed the shard threshold, executing the step B;
and B: inquiring whether the fragments of the expansion table under the shard exceed the fragment threshold value: if yes, executing step C; otherwise, go to step S4;
and C: the ES cluster calculates the number of the shards exceeding the shard threshold according to the size of the shard threshold, and checks whether the extended index form exists or whether the shards of the extended index form are full: if the new extended index form does not exist or the shard is full, continuing to extend the new extended index form, wherein the number of the shard is twice of the number of the existing shard, and updating the information of the newly added extended index form into the routing table; if the shard lists exceed the shard threshold, listing all the shard lists exceeding the shard threshold, and adding the shard lists into a task queue of the Zookeeper after descending the order; generating a plurality of job tasks by a task queue of the Zookeeper according to the shard list;
step S4: splitting the shard according to the following steps:
step D: after the ES cluster acquires a job task from the task queue of the Zookeeper, the ES cluster informs the Ares warehousing program to stop warehousing the list, and judges whether the Ares warehousing program returns a message: if yes, executing step E; if not, waiting for the response of the Ares warehousing program;
step E: the ES cluster starts to split the shards according to the following rules:
step E1: the ES cluster acquires the storage size of the shard;
step E2: dividing the storage size by 2 to obtain a fragmentation calculation result, and comparing the fragmentation calculation result with a fragmentation threshold value: if the number of times of dividing the storage size by 2 is greater than the slicing threshold, recording the number N of times of dividing the storage size by 2, and executing the step E2; if the number of the split pieces is smaller than the splitting threshold value, recording 2 multiplied by N as the number of the split pieces;
step E3: acquiring the total data amount total of the shard, wherein the divided data amount K: k total ÷ (2 × N);
step E4: giving a time T through an ES cluster query interface, wherein the unit of T is second, when data acquired in T seconds is recorded as m, the coefficient value is s, and the size of s is equal to K/m; the ES cluster splits the shards according to the split number, the data size K and the coefficient s;
step F: the ES cluster numbers the new shards after the shards are split, and the numbers of the new shards are set as the shards [0 ];
step G: deleting data in the shard, replacing the data in the shard [0] shard with the data in the shard, and adding the shard [0] shard of the information into the index form; writing the fragments except the (0) fragment in the shards into an NFS shared directory, expanding the fragments of the index form, performing recovery on the fragments of the index form according to the fragments in the NFS shared directory, and adding the shards exceeding the fragment threshold into a Zookeeper task queue according to the method in the step C after descending the order;
step H: recording the flow track of the splitting operation, updating the flow track into a routing table, generating a new routing rule by the routing table according to the new flow track, and warehousing or inquiring data by the ES cluster according to the new routing rule;
step S5: and (4) ending the slicing expansion, and repeatedly executing the step S1 to the step S4 until the ES cluster provides field sections for storage for all index forms.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710102541.9A CN108509437B (en) | 2017-02-24 | 2017-02-24 | ElasticSearch query acceleration method |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710102541.9A CN108509437B (en) | 2017-02-24 | 2017-02-24 | ElasticSearch query acceleration method |
Publications (2)
Publication Number | Publication Date |
---|---|
CN108509437A CN108509437A (en) | 2018-09-07 |
CN108509437B true CN108509437B (en) | 2021-09-17 |
Family
ID=63373643
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201710102541.9A Active CN108509437B (en) | 2017-02-24 | 2017-02-24 | ElasticSearch query acceleration method |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN108509437B (en) |
Families Citing this family (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110909186B (en) * | 2018-09-14 | 2023-08-22 | 中国科学院上海高等研究院 | Hyperspectral remote sensing data storage and retrieval method and system, storage medium and terminal |
CN109542930A (en) * | 2018-11-16 | 2019-03-29 | 重庆邮电大学 | A kind of data efficient search method based on ElasticSearch |
CN109885642B (en) * | 2019-02-18 | 2021-11-02 | 国家计算机网络与信息安全管理中心 | Hierarchical storage method and device for full-text retrieval |
CN109885536B (en) * | 2019-02-26 | 2023-06-16 | 深圳众享互联科技有限公司 | Distributed data fragment storage and fuzzy search method |
CN110909737A (en) * | 2019-11-14 | 2020-03-24 | 武汉虹旭信息技术有限责任公司 | Picture character recognition method and system |
CN111538747B (en) * | 2020-05-27 | 2023-04-14 | 支付宝(杭州)信息技术有限公司 | Data query method, device and equipment and auxiliary data query method, device and equipment |
CN111698239A (en) * | 2020-06-08 | 2020-09-22 | 星辰天合(北京)数据科技有限公司 | Application control method, device and system based on network file system |
CN112181993A (en) * | 2020-10-27 | 2021-01-05 | 广州市网星信息技术有限公司 | Service data query method, device, server and storage medium |
CN112364189B (en) * | 2020-11-16 | 2023-02-21 | 浪潮云信息技术股份公司 | Electronic certificate application method based on ES service |
CN113360706A (en) * | 2021-06-20 | 2021-09-07 | 杭州登虹科技有限公司 | Video Timeline storage method based on object storage and elastic search |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP5119550B2 (en) * | 2007-12-28 | 2013-01-16 | 株式会社メガチップス | Data processing system and data processing method |
CN103324642A (en) * | 2012-03-23 | 2013-09-25 | 日电(中国)有限公司 | Data index establishing system and method as well as data query method |
CN103605704A (en) * | 2013-11-08 | 2014-02-26 | 深圳大学 | Mass url (uniform resource locator) data any field indexing and retrieving method |
CN105189314A (en) * | 2013-03-15 | 2015-12-23 | 西姆伯蒂克有限责任公司 | Automated storage and retrieval system |
CN106202207A (en) * | 2016-06-28 | 2016-12-07 | 中国电子科技集团公司第二十八研究所 | A kind of index based on HBase ORM and searching system |
CN106446273A (en) * | 2016-10-21 | 2017-02-22 | 天津海量信息技术股份有限公司 | ES (Elastic Search) global data deduplication method based on rpc |
Family Cites Families (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20070061294A1 (en) * | 2005-09-09 | 2007-03-15 | Microsoft Corporation | Source code file search |
-
2017
- 2017-02-24 CN CN201710102541.9A patent/CN108509437B/en active Active
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP5119550B2 (en) * | 2007-12-28 | 2013-01-16 | 株式会社メガチップス | Data processing system and data processing method |
CN103324642A (en) * | 2012-03-23 | 2013-09-25 | 日电(中国)有限公司 | Data index establishing system and method as well as data query method |
CN105189314A (en) * | 2013-03-15 | 2015-12-23 | 西姆伯蒂克有限责任公司 | Automated storage and retrieval system |
CN103605704A (en) * | 2013-11-08 | 2014-02-26 | 深圳大学 | Mass url (uniform resource locator) data any field indexing and retrieving method |
CN106202207A (en) * | 2016-06-28 | 2016-12-07 | 中国电子科技集团公司第二十八研究所 | A kind of index based on HBase ORM and searching system |
CN106446273A (en) * | 2016-10-21 | 2017-02-22 | 天津海量信息技术股份有限公司 | ES (Elastic Search) global data deduplication method based on rpc |
Non-Patent Citations (2)
Title |
---|
ElasticSearch分布式搜索引擎在天文大数据检索中的应用研究;陈亚杰等;《天文学报》;20160311;全文 * |
基于HBase+ ElasticSearch的海量交通数据实时存取方案设计;董长青等;《大数据》;20170120;全文 * |
Also Published As
Publication number | Publication date |
---|---|
CN108509437A (en) | 2018-09-07 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN108509437B (en) | ElasticSearch query acceleration method | |
US11423041B2 (en) | Maintaining data lineage to detect data events | |
CN110362544B (en) | Log processing system, log processing method, terminal and storage medium | |
CN109614402B (en) | Multidimensional data query method and device | |
CN111258978B (en) | Data storage method | |
CN104424229A (en) | Calculating method and system for multi-dimensional division | |
CN106156088B (en) | Index data processing method, data query method and device | |
CN109063196A (en) | Data processing method and device, electronic equipment and computer readable storage medium | |
WO2017174013A1 (en) | Data storage management method and apparatus, and data storage system | |
CN111414361A (en) | Label data storage method, device, equipment and readable storage medium | |
CN113051460A (en) | Elasticissearch-based data retrieval method and system, electronic device and storage medium | |
CN116166191A (en) | Integrated system of lake and storehouse | |
CN108509438B (en) | ElasticSearch fragment expansion method | |
CN104063456B (en) | Based on vector query from broadcasting media atlas analysis method and apparatus | |
Hurst et al. | Social streams blog crawler | |
CN111104408A (en) | Data exchange method and device based on map data and storage medium | |
CN102521379A (en) | Internet information collection method and internet information collection device based on active push technology | |
CN111723063A (en) | Method and device for processing offline log data | |
WO2023028517A1 (en) | Updating records in a real-time storage system | |
EL-SAYED et al. | Impact of small files on hadoop performance: literature survey and open points | |
CN113656469B (en) | Big data processing method and device | |
CN111061719B (en) | Data collection method, device, equipment and storage medium | |
CN114741467A (en) | Full-text retrieval method and system | |
US10185729B2 (en) | Index creation method and system | |
CN111427910A (en) | Data processing method and device |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |