CN108509437A - A kind of ElasticSearch inquiries accelerated method - Google Patents
A kind of ElasticSearch inquiries accelerated method Download PDFInfo
- Publication number
- CN108509437A CN108509437A CN201710102541.9A CN201710102541A CN108509437A CN 108509437 A CN108509437 A CN 108509437A CN 201710102541 A CN201710102541 A CN 201710102541A CN 108509437 A CN108509437 A CN 108509437A
- Authority
- CN
- China
- Prior art keywords
- clusters
- data
- fragment
- shard
- index
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000000034 method Methods 0.000 title claims abstract description 13
- 239000012634 fragment Substances 0.000 claims description 87
- 238000003860 storage Methods 0.000 claims description 43
- 238000013467 fragmentation Methods 0.000 claims description 27
- 238000006062 fragmentation reaction Methods 0.000 claims description 27
- 230000008676 import Effects 0.000 claims description 12
- 238000013480 data collection Methods 0.000 claims description 9
- 230000014759 maintenance of location Effects 0.000 claims description 3
- 230000008520 organization Effects 0.000 claims description 3
- 210000000352 storage cell Anatomy 0.000 claims description 3
- 238000012795 verification Methods 0.000 claims description 3
- 238000011084 recovery Methods 0.000 claims 2
- 235000006508 Nelumbo nucifera Nutrition 0.000 claims 1
- 240000002853 Nelumbo nucifera Species 0.000 claims 1
- 235000006510 Nelumbo pentapetala Nutrition 0.000 claims 1
- 230000004044 response Effects 0.000 claims 1
- 238000005516 engineering process Methods 0.000 abstract description 3
- 241001269238 Data Species 0.000 description 3
- 238000004458 analytical method Methods 0.000 description 2
- 230000006870 function Effects 0.000 description 2
- 230000001143 conditioned effect Effects 0.000 description 1
- 230000001419 dependent effect Effects 0.000 description 1
- 238000009434 installation Methods 0.000 description 1
- 238000007726 management method Methods 0.000 description 1
- 230000007246 mechanism Effects 0.000 description 1
- 238000010295 mobile communication Methods 0.000 description 1
- 238000003032 molecular docking Methods 0.000 description 1
- 238000012544 monitoring process Methods 0.000 description 1
- 238000012545 processing Methods 0.000 description 1
Landscapes
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention discloses a kind of ElasticSearch to inquire accelerated method, computer big data index technology field, it is that each field increases Payload load domain first to distribute bright, then filter operation is done on the basis of single subquery condition by Payload load domain again, if each result set data volume is very big when solving the inquiry of ES initial data, the problem of taking intersection and the calculating of union that can occupy the plenty of time, improves index efficiency.
Description
Technical field
The invention belongs to computer big data index technology fields.
Background technology
Nowadays, one mass produces, shares and opened using the epoch of data, and data expand and become rapidly
Greatly, the mankind come into Internet era.The mankind are brought into a magnanimity by especially social networks, e-commerce and mobile communication
Structure and non-structural data information new era.Huge data volume causes these mass datas to have very high complexity,
And it full of variation, deals with extremely complex.How analyzing processing is carried out to mass data, and externally provided simple and convenient
Service, the problem that must be faced as many IT enterprises and mechanism.
Mass data is divided into structural data and unstructured data, structural data refer to such as business finance account and
Creation data, student's fractional data, statistical report form data etc., unstructured data are then some text datas, video/audio
Equal multi-medium datas etc..Wherein unstructured data accounts for 80% of mass data or so.Structural data can be by traditional
Relevant database and the distributed No-SQL databases developed later are handled, and unstructured data can then pass through full text
Retrieval technique externally provides inquiry service.
In current full-text search, Lucene is the most simple and convenient, and Lucene is a full text information retrieval kit, uses
Be inverted file index structure.Its not instead of complete search for application, index is provided for your application program
And function of search.Full-text index/the search function realized in various applications for application can be easily embedded into.Currently, with
Clustering based on Lucene includes mainly Solr and Elasticsearch(Abbreviation ES below),
ElasticSearch is a search server based on Lucene.It provides the full text of a distributed multi-user ability
Search engine supports RESTful web, java interfaces, can support to search in real time have and stablize, reliably, quickly, installation makes
The features such as with facilitating.
ES initial data is inquired, and is that combination condition is subdivided into sub- condition one by one to issue inquiry, then to each result set
Carry out the operations such as intersection or union takes the operations such as intersection and union that can occupy at this time if each result set data volume is very big
Plenty of time.
Invention content
The object of the present invention is to provide a kind of ElasticSearch to inquire accelerated method, solves the inquiry of ES initial data
The each result set data volumes of Shi Ruguo are very big, then the problem of taking intersection and the calculating of union that can occupy the plenty of time, improve rope
Draw efficiency.
To achieve the above object, the present invention uses following technical scheme:
A kind of ElasticSearch inquiries accelerated method, includes the following steps:
Step 1:Full-text index system is established, full-text index system includes Hadoop storage servers cluster, WEB interface service
Device, data import server and data collection station, and data collection station connects data by internet and imports server, WEB
Interface server imports server with data and connects Hadoop storage server clusters by internet;
Step 2:Full-text search platform is established in Hadoop storage server clusters by Lucene full text information retrievals tool,
And ES clusters are distributed in Hadoop storage server clusters by Lucene full text information retrievals tool;
Step 3:Flow data or text data are input to data and import server by data collection station, and data pour into server will
Flow data or text data are sent to Hadoop storage server clusters and are stored;
Step 4:ES clusters are built by the data that Lucene full text information retrieval tools are Hadoop storage server cluster-based storages
The index data table of vertical inverted file index structure, ES clusters provide the field area of storage for index data table;The storage
In field area field area is stored comprising multiple number of documents;
Step 5:According to the bottom storage organization that Lucene full text information retrieval tools provide, ES clusters add in inverted list chained list
A Payload load domain is added, all Payload load domain is set to number of documents and stores field area back;
Step 6:User sends querying condition to ES collection by WEB interface server input inquiry condition, WEB interface server
Group;The querying condition includes precise inquiry conditions, range query condition, prefix lookups condition and Payload range query items
Part;
Step 7:ES clusters by Lucene full text information retrievals tool first according to precise inquiry conditions, range query condition and
Prefix lookups condition is retrieved, and accordingly obtains accurate query result, range query result and prefix lookups result;
Step 8:ES clusters are according to Payload range queries condition respectively to accurate query result, range query result and prefix
Query result is filtered, and obtains accurate query results, range query result set and prefix lookups result set;
Step 9:Accurate query results, range query result set and prefix lookups result set are done intersection calculating by ES clusters, are obtained
Go out final retrieval result.
The ES clusters are Elasticsearch server clusters.
Payload load domain is the memory block that memory range inquires field, and the range query field includes the time
Field.
In the step 4, ES clusters provide the field area of storage for index data table according to the following steps:
Step S1:Setting fragment is the basic storage cell of each index data table, if each index data table includes
Dry fragment, ES clusters store the distribution of index data table to the different storage mediums in ES clusters according to the fragment of index data table
In;
Step S2:Index lists are set as an index data table in ES clusters, shard fragments are one of index lists
Fragment;Include multiple shard fragments in index lists;Set a fragmentation threshold;
Step S3:ES clusters establish an extension index list to index lists, and ES clusters read maximum in index lists
Shard fragments, judge whether shard fragments reach fragmentation threshold:It is to then follow the steps S4, it is no, then follow the steps S5;
ES clusters are established an extension index list to index lists and are as follows:
Step A:ES clusters obtain index lists, and traverse each shard fragment in index lists, and do and following sentence
It is disconnected:If shard fragments exceed fragmentation threshold, C is thened follow the steps;If shard fragments then follow the steps B without departing from fragmentation threshold;
Step B:Whether the fragment of the expansion table under inquiry shard fragments has beyond fragmentation threshold:It is to then follow the steps C;It is no,
Then follow the steps S4;
Step C:ES clusters according to the size of fragmentation threshold be calculated over fragmentation threshold shard fragments should cutting number,
Whether the shard fragments that verification extension index lists whether there is or extend index lists have expired:If being not present or shard dividing
Piece has been expired, then continues to extend new extension index lists, and shard fragment numbers are the two of the number of existing shard fragments
Times, in newly-increased extension index form informations update to routing table;If in the presence of had more than fragmentation threshold is listed
Shard fragment lists, and be added in the task queue of Zookeeper after descending arrangement;The task queue of Zookeeper according to
Shard fragment lists generate multiple job tasks;
Step S4:Shard fragments are divided according to following steps:
Step D:After obtaining a job task in the task queue of Zookeeper, notice Ares enters library and stops ES clusters
In-stockroom operation only is carried out to the table, judges that Ares enters whether library returns to message:It is to then follow the steps E;It is no, then it waits for
Ares enters library response;
Step E:ES clusters are started to carry out splitting operation to shard fragments by following rule:
Step E1:ES clusters obtain the storage size of the shard fragments;
Step E2:Fragment result of calculation will be obtained behind the storage size divided by 2, by fragment result of calculation compared with fragmentation threshold
Compared with:If it is greater than fragmentation threshold, the storage size divided by 2 times N are executed step E2 by record;If it is less than fragment threshold
Value, 2 × N of record is the number to be divided;
Step E3:The total amount of data total for obtaining the shard fragments, the data volume size K after division:K=total÷(2×
N);
Step E4:A time T is given by ES cluster query interfaces, T unit is the second, and when T seconds, the interior data obtained were denoted as m,
Coefficient value is s, and the size of s is equal to K ÷ m;ES clusters are according to the number of the division, data volume size K and coefficient s to shard
Fragment is into line splitting;
Step F:New fragment after ES clusters divide shard fragments is numbered, and sets the number of new fragment as shard
[0] fragment;
Step G:The data in shard fragments are deleted, the data in shard [0] fragment are substituted into the data in shard fragments,
And the shard of information [0] fragment is added in index lists;Simultaneously will in shard fragments except shard [0] fragments with
Outer fragment is written in the shared catalogues of NFS, and extends the fragment of index lists, and dividing in the catalogue shared according to NFS
It, will be more than the shard of fragmentation threshold again according to the method for step C after piece does recovery recoveries to the fragment of index lists
Fragment list, and be added in the task queue of Zookeeper after descending arrangement;
Step H:The flow path track of splitting operation is recorded, and is updated in flow path track to routing table, routing table is according to new flow
Data are put in storage or are inquired according to new routing rule by the new routing rule of Track Pick-up, ES clusters;
Step S5:Fragment extension terminates, and repeats step S1 to step S4, until ES clusters are that all index lists carry
For the field area of storage.
A kind of ElasticSearch of the present invention inquires accelerated method, if solve the inquiry of ES initial data
Each result set data volume is very big, then the problem of taking intersection and the calculating of union that can occupy the plenty of time, improves index efficiency;
The present invention is realized does the efficient operation filtered on single sub- conditioned basic, improves concurrent search efficiency.
Description of the drawings
Fig. 1 is the overview flow chart of the present invention;
Fig. 2 is the flow chart of the step 4 of the present invention;
Fig. 3 is the flow chart of the step S3 of the present invention;
Fig. 4 is the flow chart of the step S4 of the present invention.
Specific implementation mode
A kind of ElasticSearch as shown in Figure 1 to 4 inquires accelerated method, includes the following steps:
Step 1:Full-text index system is established, full-text index system includes Hadoop storage servers cluster, WEB interface service
Device, data import server and data collection station, and data collection station connects data by internet and imports server, WEB
Interface server imports server with data and connects Hadoop storage server clusters by internet;
Step 2:Full-text search platform is established in Hadoop storage server clusters by Lucene full text information retrievals tool,
And ES clusters are distributed in Hadoop storage server clusters by Lucene full text information retrievals tool;
Step 3:Flow data or text data are input to data and import server by data collection station, and data pour into server will
Flow data or text data are sent to Hadoop storage server clusters and are stored;
Step 4:ES clusters are built by the data that Lucene full text information retrieval tools are Hadoop storage server cluster-based storages
The index data table of vertical inverted file index structure, ES clusters provide the field area of storage for index data table;The storage
In field area field area is stored comprising multiple number of documents;
Step 5:According to the bottom storage organization that Lucene full text information retrieval tools provide, ES clusters add in inverted list chained list
A Payload payload fields are added, all Payload load domain is set to number of documents and stores field area back;
Step 6:User sends querying condition to ES collection by WEB interface server input inquiry condition, WEB interface server
Group;The querying condition includes precise inquiry conditions, range query condition, prefix lookups condition and Payload range query items
Part;
Step 7:ES clusters by Lucene full text information retrievals tool first according to precise inquiry conditions, range query condition and
Prefix lookups condition is retrieved, and accordingly obtains accurate query result, range query result and prefix lookups result;
Step 8:ES clusters are according to Payload range queries condition respectively to accurate query result, range query result and prefix
Query result is filtered, and obtains accurate query results, range query result set and prefix lookups result set;
Step 9:Accurate query results, range query result set and prefix lookups result set are done intersection calculating by ES clusters, are obtained
Go out final retrieval result.
The ES clusters are Elasticsearch server clusters.
Payload load domain is the memory block that memory range inquires field, and the range query field includes the time
Field.
The extension of ES fragments uses Master-Slave structures, passes through tables of data dependent on zookeeper(Index)Fragment
List generates multiple operations, and each these operations of division module schedules execute operation, complete division fragment(Shard)Operation.
In the step 4, ES clusters provide the field area of storage for index data table according to the following steps:
Step S1:Setting fragment is the basic storage cell of each index data table, if each index data table includes
Dry fragment, ES clusters store the distribution of index data table to the different storage mediums in ES clusters according to the fragment of index data table
In;
Step S2:Index lists are set as an index data table in ES clusters, shard fragments are one of index lists
Fragment;Include multiple shard fragments in index lists;Set a fragmentation threshold;ES clusters establish one to index lists
The premise of a extension index lists, which is index lists, alias, and the extension index lists of foundation have same alias, and expand
The shard fragments number for opening up index lists is identical as index lists;
Step S3:ES clusters establish an extension index list to index lists, and ES clusters read maximum in index lists
Shard fragments, judge whether shard fragments reach fragmentation threshold:It is to then follow the steps S4, it is no, then follow the steps S5;
ES clusters are established an extension index list to index lists and are as follows:
Step A:ES clusters obtain index lists, and traverse each shard fragment in index lists, and do and following sentence
It is disconnected:If shard fragments exceed fragmentation threshold, C is thened follow the steps;If shard fragments then follow the steps B without departing from fragmentation threshold;
Step B:Whether the fragment of the expansion table under inquiry shard fragments has beyond fragmentation threshold:It is to then follow the steps C;It is no,
Then follow the steps S4;
Step C:ES clusters according to the size of fragmentation threshold be calculated over fragmentation threshold shard fragments should cutting number,
Whether the shard fragments that verification extension index lists whether there is or extend index lists have expired:If being not present or shard dividing
Piece has been expired, then continues to extend new extension index lists, and shard fragment numbers are the two of the number of existing shard fragments
Times, in newly-increased extension index form informations update to routing table;If in the presence of had more than fragmentation threshold is listed
Shard fragment lists, and be added in the task queue of Zookeeper after descending arrangement;The task queue of Zookeeper according to
Shard fragment lists generate multiple job tasks;ZooKeeper is one distributed, the Distributed Application journey of open source code
Sequence coordination service is mono- realization increased income of Chubby of Google, is the significant components of Hadoop and Hbase.
Step S4:Shard fragments are divided according to following steps:
Step D:After obtaining a job task in the task queue of Zookeeper, notice Ares enters library and stops ES clusters
In-stockroom operation only is carried out to the table, judges that Ares enters whether library returns to message:It is to then follow the steps E;It is no, then it waits for
Ares enters library response;
Step E:ES clusters are started to carry out splitting operation to shard fragments by following rule:
Step E1:ES clusters obtain the storage size of the shard fragments;
Step E2:Fragment result of calculation will be obtained behind the storage size divided by 2, by fragment result of calculation compared with fragmentation threshold
Compared with:If it is greater than fragmentation threshold, the storage size divided by 2 times N are executed step E2 by record;If it is less than fragment threshold
Value, 2 × N of record is the number to be divided;
Step E3:The total amount of data total for obtaining the shard fragments, the data volume size K after division:K=total÷(2×
N);
Step E4:A time T is given by ES cluster query interfaces, T unit is the second, and when T seconds, the interior data obtained were denoted as m,
Coefficient value is s, and the size of s is equal to K ÷ m;ES clusters are according to the number of the division, data volume size K and coefficient s to shard
Fragment is into line splitting;
Step F:New fragment after ES clusters divide shard fragments is numbered, and sets the number of new fragment as shard
[0] fragment;
Step G:The data in shard fragments are deleted, the data in shard [0] fragment are substituted into the data in shard fragments,
And the shard of information [0] fragment is added in index lists;Simultaneously will in shard fragments except shard [0] fragments with
Outer fragment is written in the shared catalogues of NFS, and extends the fragment of index lists, and dividing in the catalogue shared according to NFS
It, will be more than the shard of fragmentation threshold again according to the method for step C after piece does recovery recoveries to the fragment of index lists
Fragment list, and be added in the task queue of Zookeeper after descending arrangement;
Step H:The flow path track of splitting operation is recorded, and is updated in flow path track to routing table, routing table is according to new flow
Data are put in storage or are inquired according to new routing rule by the new routing rule of Track Pick-up, ES clusters;
Step S5:Fragment extension terminates, and repeats step S1 to step S4, until ES clusters are that all index lists carry
For the field area of storage.
In use, as shown in Figure 1, increasing the domains Payload for each field, data query mode accordingly changes, example
If any querying condition A and C and D and B, wherein A is precise inquiry conditions, C is range query condition, D is that prefix is looked into
Inquiry condition, B are Payload range query conditions.According to method provided by the invention, bundle condition A, C, D are issued, and are found respectively
Respective query result after respectively inquiring result, then by the B progress result filterings of Payload conditions, reduces each height knot
The data volume of fruit collection, finally takes three batches of filtered result intersections to obtain final result.
In Lucene full text information retrieval tools(Abbreviation Lucene)With a series of domains support Payload are added in ES clusters
The interface of inquiry so that user can directly invoke connecing for the Payload of ES as calling other Elasticsearch interfaces
Mouthful, it is not required to the domains the Payload storage organization and interface of perception bottom Lucene, realizes and the domains Payload is efficiently used;It is right
" single part equivalence+range ", " prefix condition+range ", " hazy condition+range ", " IN conditions+range ", " range
This five kinds of request for information of+range " do Payload encapsulation.
The data collection station is 10,000,000,000 interchangers, and 10,000,000,000 interchangers can obtain a large amount of data source from internet,
The format of data source is data file and stream data;
ES clusters provide data loading, query analysis and management and monitoring interface, storage medium to Hadoop storage server clusters
For local disk, ES clusters support various Spark components;WEB interface server passes through Zues-client and Loki and ES clusters
Docking;Zues-client is the ES interfaces encapsulated, is called for upper layer;Loki is the inquiry middleware of unified index, is responsible for reception
The structural data of upper-layer user, unstructured data, blended data inquiry request, analysis cutting forward the request to ES, and root
According to the data id of return data are obtained from structural data system, unstructured data system.
A kind of ElasticSearch of the present invention inquires accelerated method, if solve the inquiry of ES initial data
Each result set data volume is very big, then the problem of taking intersection and the calculating of union that can occupy the plenty of time, improves index efficiency;
The present invention is realized does the efficient operation filtered on single sub- conditioned basic, improves concurrent search efficiency.
Claims (4)
1. a kind of ElasticSearch inquires accelerated method, it is characterised in that:Include the following steps:
Step 1:Full-text index system is established, full-text index system includes Hadoop storage servers cluster, WEB interface service
Device, data import server and data collection station, and data collection station connects data by internet and imports server, WEB
Interface server imports server with data and connects Hadoop storage server clusters by internet;
Step 2:Full-text search platform is established in Hadoop storage server clusters by Lucene full text information retrievals tool,
And ES clusters are distributed in Hadoop storage server clusters by Lucene full text information retrievals tool;
Step 3:Flow data or text data are input to data and import server by data collection station, and data pour into server will
Flow data or text data are sent to Hadoop storage server clusters and are stored;
Step 4:ES clusters are built by the data that Lucene full text information retrieval tools are Hadoop storage server cluster-based storages
The index data table of vertical inverted file index structure, ES clusters provide the field area of storage for index data table;The storage
In field area field area is stored comprising multiple number of documents;
Step 5:According to the bottom storage organization that Lucene full text information retrieval tools provide, ES clusters add in inverted list chained list
A Payload load domain is added, all Payload load domain is set to number of documents and stores field area back;
Step 6:User sends querying condition to ES collection by WEB interface server input inquiry condition, WEB interface server
Group;The querying condition includes precise inquiry conditions, range query condition, prefix lookups condition and Payload range query items
Part;
Step 7:ES clusters by Lucene full text information retrievals tool first according to precise inquiry conditions, range query condition and
Prefix lookups condition is retrieved, and accordingly obtains accurate query result, range query result and prefix lookups result;
Step 8:ES clusters are according to Payload range queries condition respectively to accurate query result, range query result and prefix
Query result is filtered, and obtains accurate query results, range query result set and prefix lookups result set;
Step 9:Accurate query results, range query result set and prefix lookups result set are done intersection calculating by ES clusters, are obtained
Go out final retrieval result.
2. a kind of ElasticSearch as described in claim 1 inquires accelerated method, it is characterised in that:The ES clusters are
Elasticsearch server clusters.
3. a kind of ElasticSearch as described in claim 1 inquires accelerated method, it is characterised in that:The Payload is carried
Lotus domain is the memory block that memory range inquires field, and the range query field includes time field.
4. a kind of ElasticSearch as described in claim 1 inquires accelerated method, it is characterised in that:In the step 4
In, ES clusters provide the field area of storage for index data table according to the following steps:
Step S1:Setting fragment is the basic storage cell of each index data table, if each index data table includes
Dry fragment, ES clusters store the distribution of index data table to the different storage mediums in ES clusters according to the fragment of index data table
In;
Step S2:Index lists are set as an index data table in ES clusters, shard fragments are one of index lists
Fragment;Include multiple shard fragments in index lists;Set a fragmentation threshold;
Step S3:ES clusters establish an extension index list to index lists, and ES clusters read maximum in index lists
Shard fragments, judge whether shard fragments reach fragmentation threshold:It is to then follow the steps S4, it is no, then follow the steps S5;
ES clusters are established an extension index list to index lists and are as follows:
Step A:ES clusters obtain index lists, and traverse each shard fragment in index lists, and do and following sentence
It is disconnected:If shard fragments exceed fragmentation threshold, C is thened follow the steps;If shard fragments then follow the steps B without departing from fragmentation threshold;
Step B:Whether the fragment of the expansion table under inquiry shard fragments has beyond fragmentation threshold:It is to then follow the steps C;It is no,
Then follow the steps S4;
Step C:ES clusters according to the size of fragmentation threshold be calculated over fragmentation threshold shard fragments should cutting number,
Whether the shard fragments that verification extension index lists whether there is or extend index lists have expired:If being not present or shard dividing
Piece has been expired, then continues to extend new extension index lists, and shard fragment numbers are the two of the number of existing shard fragments
Times, in newly-increased extension index form informations update to routing table;If in the presence of had more than fragmentation threshold is listed
Shard fragment lists, and be added in the task queue of Zookeeper after descending arrangement;The task queue of Zookeeper according to
Shard fragment lists generate multiple job tasks;
Step S4:Shard fragments are divided according to following steps:
Step D:After obtaining a job task in the task queue of Zookeeper, notice Ares enters library and stops ES clusters
In-stockroom operation only is carried out to the table, judges that Ares enters whether library returns to message:It is to then follow the steps E;It is no, then it waits for
Ares enters library response;
Step E:ES clusters are started to carry out splitting operation to shard fragments by following rule:
Step E1:ES clusters obtain the storage size of the shard fragments;
Step E2:Fragment result of calculation will be obtained behind the storage size divided by 2, by fragment result of calculation compared with fragmentation threshold
Compared with:If it is greater than fragmentation threshold, the storage size divided by 2 times N are executed step E2 by record;If it is less than fragment threshold
Value, 2 × N of record is the number to be divided;
Step E3:The total amount of data total for obtaining the shard fragments, the data volume size K after division:K=total ÷ (2 ×
N);
Step E4:A time T is given by ES cluster query interfaces, T unit is the second, and when T seconds, the interior data obtained were denoted as m,
Coefficient value is s, and the size of s is equal to K ÷ m;ES clusters are according to the number of the division, data volume size K and coefficient s to shard
Fragment is into line splitting;
Step F:New fragment after ES clusters divide shard fragments is numbered, and sets the number of new fragment as shard
[0] fragment;
Step G:The data in shard fragments are deleted, the data in shard [0] fragment are substituted into the data in shard fragments,
And the shard of information [0] fragment is added in index lists;Simultaneously by shard fragments in addition to shard [0] fragment
Fragment is written in the shared catalogues of NFS, and extends the fragment of index lists, and according to the fragment pair in catalogue shared NFS
It, will be more than the shard fragments of fragmentation threshold again according to the method for step C after the fragment of index lists does recovery recoveries
List, and be added in the task queue of Zookeeper after descending arrangement;
Step H:The flow path track of splitting operation is recorded, and is updated in flow path track to routing table, routing table is according to new flow
Data are put in storage or are inquired according to new routing rule by the new routing rule of Track Pick-up, ES clusters;
Step S5:Fragment extension terminates, and repeats step S1 to step S4, until ES clusters are that all index lists carry
For the field area of storage.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710102541.9A CN108509437B (en) | 2017-02-24 | 2017-02-24 | ElasticSearch query acceleration method |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710102541.9A CN108509437B (en) | 2017-02-24 | 2017-02-24 | ElasticSearch query acceleration method |
Publications (2)
Publication Number | Publication Date |
---|---|
CN108509437A true CN108509437A (en) | 2018-09-07 |
CN108509437B CN108509437B (en) | 2021-09-17 |
Family
ID=63373643
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201710102541.9A Active CN108509437B (en) | 2017-02-24 | 2017-02-24 | ElasticSearch query acceleration method |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN108509437B (en) |
Cited By (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109542930A (en) * | 2018-11-16 | 2019-03-29 | 重庆邮电大学 | A kind of data efficient search method based on ElasticSearch |
CN109885536A (en) * | 2019-02-26 | 2019-06-14 | 深圳众享互联科技有限公司 | One kind is based on the storage of distributed data fragment and fuzzy search method |
CN109885642A (en) * | 2019-02-18 | 2019-06-14 | 国家计算机网络与信息安全管理中心 | Classification storage method and device towards full-text search |
CN110909737A (en) * | 2019-11-14 | 2020-03-24 | 武汉虹旭信息技术有限责任公司 | Picture character recognition method and system |
CN110909186A (en) * | 2018-09-14 | 2020-03-24 | 中国科学院上海高等研究院 | Hyperspectral remote sensing data storage and retrieval method and system, storage medium and terminal |
CN111538747A (en) * | 2020-05-27 | 2020-08-14 | 支付宝(杭州)信息技术有限公司 | Data query method, device and equipment and auxiliary data query method, device and equipment |
CN111698239A (en) * | 2020-06-08 | 2020-09-22 | 星辰天合(北京)数据科技有限公司 | Application control method, device and system based on network file system |
CN112181993A (en) * | 2020-10-27 | 2021-01-05 | 广州市网星信息技术有限公司 | Service data query method, device, server and storage medium |
CN112364189A (en) * | 2020-11-16 | 2021-02-12 | 浪潮云信息技术股份公司 | Electronic certificate application method based on ES service |
CN113360706A (en) * | 2021-06-20 | 2021-09-07 | 杭州登虹科技有限公司 | Video Timeline storage method based on object storage and elastic search |
CN114138830A (en) * | 2021-11-15 | 2022-03-04 | 紫金诚征信有限公司 | Second-level query method and device for mass data of big data and computer medium |
Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20070061294A1 (en) * | 2005-09-09 | 2007-03-15 | Microsoft Corporation | Source code file search |
JP5119550B2 (en) * | 2007-12-28 | 2013-01-16 | 株式会社メガチップス | Data processing system and data processing method |
CN103324642A (en) * | 2012-03-23 | 2013-09-25 | 日电(中国)有限公司 | Data index establishing system and method as well as data query method |
CN103605704A (en) * | 2013-11-08 | 2014-02-26 | 深圳大学 | Mass url (uniform resource locator) data any field indexing and retrieving method |
CN105189314A (en) * | 2013-03-15 | 2015-12-23 | 西姆伯蒂克有限责任公司 | Automated storage and retrieval system |
CN106202207A (en) * | 2016-06-28 | 2016-12-07 | 中国电子科技集团公司第二十八研究所 | A kind of index based on HBase ORM and searching system |
CN106446273A (en) * | 2016-10-21 | 2017-02-22 | 天津海量信息技术股份有限公司 | ES (Elastic Search) global data deduplication method based on rpc |
-
2017
- 2017-02-24 CN CN201710102541.9A patent/CN108509437B/en active Active
Patent Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20070061294A1 (en) * | 2005-09-09 | 2007-03-15 | Microsoft Corporation | Source code file search |
JP5119550B2 (en) * | 2007-12-28 | 2013-01-16 | 株式会社メガチップス | Data processing system and data processing method |
CN103324642A (en) * | 2012-03-23 | 2013-09-25 | 日电(中国)有限公司 | Data index establishing system and method as well as data query method |
CN105189314A (en) * | 2013-03-15 | 2015-12-23 | 西姆伯蒂克有限责任公司 | Automated storage and retrieval system |
CN103605704A (en) * | 2013-11-08 | 2014-02-26 | 深圳大学 | Mass url (uniform resource locator) data any field indexing and retrieving method |
CN106202207A (en) * | 2016-06-28 | 2016-12-07 | 中国电子科技集团公司第二十八研究所 | A kind of index based on HBase ORM and searching system |
CN106446273A (en) * | 2016-10-21 | 2017-02-22 | 天津海量信息技术股份有限公司 | ES (Elastic Search) global data deduplication method based on rpc |
Non-Patent Citations (2)
Title |
---|
董长青等: "基于HBase+ ElasticSearch的海量交通数据实时存取方案设计", 《大数据》 * |
陈亚杰等: "ElasticSearch分布式搜索引擎在天文大数据检索中的应用研究", 《天文学报》 * |
Cited By (14)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110909186A (en) * | 2018-09-14 | 2020-03-24 | 中国科学院上海高等研究院 | Hyperspectral remote sensing data storage and retrieval method and system, storage medium and terminal |
CN110909186B (en) * | 2018-09-14 | 2023-08-22 | 中国科学院上海高等研究院 | Hyperspectral remote sensing data storage and retrieval method and system, storage medium and terminal |
CN109542930A (en) * | 2018-11-16 | 2019-03-29 | 重庆邮电大学 | A kind of data efficient search method based on ElasticSearch |
CN109885642A (en) * | 2019-02-18 | 2019-06-14 | 国家计算机网络与信息安全管理中心 | Classification storage method and device towards full-text search |
CN109885536A (en) * | 2019-02-26 | 2019-06-14 | 深圳众享互联科技有限公司 | One kind is based on the storage of distributed data fragment and fuzzy search method |
CN109885536B (en) * | 2019-02-26 | 2023-06-16 | 深圳众享互联科技有限公司 | Distributed data fragment storage and fuzzy search method |
CN110909737A (en) * | 2019-11-14 | 2020-03-24 | 武汉虹旭信息技术有限责任公司 | Picture character recognition method and system |
CN111538747B (en) * | 2020-05-27 | 2023-04-14 | 支付宝(杭州)信息技术有限公司 | Data query method, device and equipment and auxiliary data query method, device and equipment |
CN111538747A (en) * | 2020-05-27 | 2020-08-14 | 支付宝(杭州)信息技术有限公司 | Data query method, device and equipment and auxiliary data query method, device and equipment |
CN111698239A (en) * | 2020-06-08 | 2020-09-22 | 星辰天合(北京)数据科技有限公司 | Application control method, device and system based on network file system |
CN112181993A (en) * | 2020-10-27 | 2021-01-05 | 广州市网星信息技术有限公司 | Service data query method, device, server and storage medium |
CN112364189A (en) * | 2020-11-16 | 2021-02-12 | 浪潮云信息技术股份公司 | Electronic certificate application method based on ES service |
CN113360706A (en) * | 2021-06-20 | 2021-09-07 | 杭州登虹科技有限公司 | Video Timeline storage method based on object storage and elastic search |
CN114138830A (en) * | 2021-11-15 | 2022-03-04 | 紫金诚征信有限公司 | Second-level query method and device for mass data of big data and computer medium |
Also Published As
Publication number | Publication date |
---|---|
CN108509437B (en) | 2021-09-17 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN108509437A (en) | A kind of ElasticSearch inquiries accelerated method | |
CN104750681B (en) | A kind of processing method and processing device of mass data | |
US10545981B2 (en) | Virtual repository management | |
CN103020204B (en) | A kind of method and its system carrying out multi-dimensional interval query to distributed sequence list | |
CN104090901B (en) | A kind of method that data are processed, device and server | |
CN102831122B (en) | Data storage method, inquiring method and inquiring device for workflow table | |
CN107943952A (en) | A kind of implementation method that full-text search is carried out based on Spark frames | |
CN106294695A (en) | A kind of implementation method towards the biggest data search engine | |
CN102902763B (en) | The method of association, retrieving information process data and process information task and device | |
CN102750326A (en) | Log management optimization method of cluster system based on downsizing strategy | |
CN106407303A (en) | Data storage method and apparatus, and data query method and apparatus | |
CN111258978A (en) | Data storage method | |
CN103440288A (en) | Big data storage method and device | |
KR20130049111A (en) | Forensic index method and apparatus by distributed processing | |
CN111552885B (en) | System and method for realizing automatic real-time message pushing operation | |
CN104239377A (en) | Platform-crossing data retrieval method and device | |
CN106708996A (en) | Method and system for full text search of relational database | |
KR20160053933A (en) | Smart search refinement | |
CN107451208A (en) | A kind of data search method and device | |
WO2014110940A1 (en) | A method, apparatus and system for storing, reading the directory index | |
WO2014029314A1 (en) | Information aggregation, classification and display method and system | |
KR101955376B1 (en) | Processing method for a relational query in distributed stream processing engine based on shared-nothing architecture, recording medium and device for performing the method | |
CN116166191A (en) | Integrated system of lake and storehouse | |
US11106739B2 (en) | Document structures for searching within and across messages | |
US8996512B2 (en) | Search engine optimization using a find operation |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |