CN106682148A - Method and device based on Solr data search - Google Patents

Method and device based on Solr data search Download PDF

Info

Publication number
CN106682148A
CN106682148A CN201611199422.1A CN201611199422A CN106682148A CN 106682148 A CN106682148 A CN 106682148A CN 201611199422 A CN201611199422 A CN 201611199422A CN 106682148 A CN106682148 A CN 106682148A
Authority
CN
China
Prior art keywords
solr
data
caching
index
optimization
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201611199422.1A
Other languages
Chinese (zh)
Inventor
于洪勇
刘晓帅
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Ruian Technology Co Ltd
Original Assignee
Beijing Ruian Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Ruian Technology Co Ltd filed Critical Beijing Ruian Technology Co Ltd
Priority to CN201611199422.1A priority Critical patent/CN106682148A/en
Publication of CN106682148A publication Critical patent/CN106682148A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2455Query execution
    • G06F16/24552Database cache management
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/27Replication, distribution or synchronisation of data between databases or within a distributed database system; Distributed database system architectures therefor

Abstract

An embodiment of the invention discloses a method and a device based on Solr data search. The method includes selecting a bottom storage database of HBase or MongoDB, and generating a rowkey according to recorded fields; optimizing a JVM (java virtual machine) based on memory optimization, and optimizing memory configuration, disk occupation and transaction logs of Solr; during data query, acquiring corresponding data from bottom storage according to an index created by the Solr. The method and the device have the advantages that weak dependency of the data on the Solr can be realized through bottom storage and data structure design, and the data is independent from the Solr; search performance is improved through Solr optimization and shema design; through JVM optimization, Solr cluster query efficiency is improved, and the probability of Socket overtime caused by Full GC of the Solr is reduced; transaction control is made on index creation of the data and storage of the data to the database, and synchronization between searched data and queried data is guaranteed.

Description

A kind of method and device based on Solr data search
Technical field
The present embodiments relate to Mass Data Searching technical field, more particularly to a kind of side based on Solr data search Method and device.
Background technology
From 2012, big data (Big Data) word is more and more referred to that people describe and define letter with it The magnanimity information that breath explosion time generation produces, and name associated technology to develop and innovation.2 months 2012《The New York Times》One Piece special column claims, and " big data " epoch have come, and in business, economic and other field, decision-making will be increasingly based on data and divide Analyse and make, and be not based on experience and intuition.
Big data is commonly used to describe a large amount of destructurings and semi-structured data that a company creates, and these are counted According to the meeting overspending time and money when relevant database is downloaded to for analyzing.Big data analysis is often contacted with cloud computing To together, because large data set analysis need the framework as Spark to tens of, hundreds if not thousands of computer in real time Share out the work.
Big data has much on earthOne group name teaches that for the data of " internet upper one day ", among one day, internet The full content of generation can carve full 1.68 hundred million DVD;The mail for sending has as many as 294,000,000,000 envelopes (equivalent to U.S.'s paper of 2 years Matter mail quantity);The community post for sending up to 2,000,000 (equivalent to《Epoch》The magazine word amount of 770 years);The mobile phone sold It it is 37.8 ten thousand, higher than the Number of infants 37.1 ten thousand ... of global birth daily
By the end of 2012, data volume was risen to PB (1024TB=1PB), EB from TB (1024GB=1TB) rank (1024PB=1EB) or even ZB (1024EB=1ZB) rank.The result of study of International Data Corporation (IDC) (IDC) shows that 2008 complete The data volume that ball is produced is 0.49ZB, and the data volume of 2009 is 0.8ZB, and it is 1.2ZB to increase within 2010, and the quantity of 2011 is more 1.82ZB is up to, everyone produces the data of more than 200GB equivalent to the whole world.And to 2012, human being's production it is all The data volume of printing material is 200PB, all data volumes about 5EB that the whole mankind said in history.The research of IBM Claim, in the total data that whole human civilization is obtained, there is 90% to produce in two years in the past.And the year two thousand twenty has been arrived, full generation Data scale produced by boundary is up to 44 times of today.[5] every day, the whole world can be uploaded more than 500,000,000 pictures, per minute Just there are 20 hours videos of duration to be shared.Even however, people daily create full detail --- including voice call, Email and information are in interior various communications, and whole pictures, video and the music for uploading, its information content also cannot and The amount of digital information on people itself for being createed every day.
Big data is so important, so that its acquisition, storage, search, shared, analysis, or even it is visually presented with, all Become current important research topic.
The content of the invention
The purpose of the embodiment of the present invention is to propose a kind of method and device based on Solr data search, it is intended to solved big Data volume is stored and to the Query Optimization of mass data.
It is that, up to this purpose, the embodiment of the present invention uses following technical scheme:
In a first aspect, a kind of method based on Solr data search, methods described includes:
From the bottom data storage storehouse of HBase or MongoDB, and rowkey is generated according to the field of record;
JVM is optimized in the optimization of internal memory, and memory configurations to Solr, disk take and transaction journal is carried out Optimization;
When data are inquired about, the index after being created according to Solr obtains corresponding data from bottom storage.
Preferably, it is described JVM is optimized in the optimization of internal memory, including:
The caching for presetting size is added after distributing to the internal memory that the Solr needs.
Preferably, the memory configurations to Solr are optimized, including:
Selection to the cache size, take-back strategy of the Solr is configured;
The caching includes that automatic preheating caching, filter caching, document caching, Query Result caching and/or thresholding are slow Deposit;
The take-back strategy is chosen for:Using FieldCache, the use of mergeFactor is reduced, make to be protected in index Few section is deposited, the compound file format using index is closed, and NIO is used from NIOFSDirectory when index is created, Direct internal memory is directly used, avoids generating segment from suitable section consolidation strategy.
Preferably, the disk to Solr takes and optimizes, including:
In the case of non-correlation use, limitation uses Term Vector;
When schema is designed, suitable document granularity is selected, selectively storage domain is set;
If a record in the unique key location database for passing through Solr, fals is all set to by the attribute of stored;
For the attribute of not merit rating, omitNorms is set to true;
To date and numeric type, precision step-length precisionStep is reduced.
Preferably, it is described that transaction journal is optimized, including:
The transaction journal is used to support that near real-time obtains data and atomic update;Make to write persistence and submit flow solution to Coupling;Support the copies synchronized of SolrCloud burst host nodes;Length and the hard frequency submitted to for balancing transaction journal.
A kind of second aspect, device based on Solr data search, described device includes:
First acquisition module, for the bottom data storage storehouse from HBase or MongoDB, and according to the word of record Duan Shengcheng rowkey;
Optimization module, for being optimized to JVM in the optimization of internal memory, and memory configurations, disk occupancy to Solr Optimized with transaction journal;
Second acquisition module, for when data are inquired about, the index after being created according to Solr to be obtained from bottom storage Take corresponding data.
Preferably, the optimization module, specifically for:
The caching for presetting size is added after distributing to the internal memory that the Solr needs.
Preferably, the optimization module, also particularly useful for:
Selection to the cache size, take-back strategy of the Solr is configured;
The caching includes that automatic preheating caching, filter caching, document caching, Query Result caching and/or thresholding are slow Deposit;
The take-back strategy is chosen for:Using FieldCache, the use of mergeFactor is reduced, make to be protected in index Few section is deposited, the compound file format using index is closed, and NIO is used from NIOFSDirectory when index is created, Direct internal memory is directly used, avoids generating segment from suitable section consolidation strategy.
Preferably, the optimization module, also particularly useful for:
In the case of non-correlation use, limitation uses Term Vector;
When schema is designed, suitable document granularity is selected, selectively storage domain is set;
If a record in the unique key location database for passing through Solr, fals is all set to by the attribute of stored;
For the attribute of not merit rating, omitNorms is set to true;
To date and numeric type, precision step-length precisionStep is reduced.
Preferably, the optimization module, also particularly useful for:
The transaction journal is used to support that near real-time obtains data and atomic update;Make to write persistence and submit flow solution to Coupling;Support the copies synchronized of SolrCloud burst host nodes;Length and the hard frequency submitted to for balancing transaction journal.
A kind of method and device based on Solr data search provided in an embodiment of the present invention, from HBase or The bottom data storage storehouse of MongoDB, and rowkey is generated according to the field of record;JVM carried out in the optimization of internal memory excellent Change, and memory configurations to Solr, disk take and transaction journal is optimized;When data are inquired about, after being created according to Solr Index from the bottom storage in obtain corresponding data.So as to by storing to bottom and the design of data structure can be with Weak dependence of the data to Solr is realized, by Dynamic data exchange out;By optimizing to Solr and shema designs lifting and searches Without hesitation can, while by JVM tunings lifted Solr clusters search efficiency and reduce Solr because occur Full GC cause Socket The possibility of time-out;Transaction controlling is carried out to data creation index and data storage to database, it is ensured that the data of search With the data syn-chronization of inquiry.
Brief description of the drawings
Fig. 1 is a kind of schematic flow sheet of method based on Solr data search provided in an embodiment of the present invention;
Fig. 2 is a kind of high-level schematic functional block diagram of device based on Solr data search provided in an embodiment of the present invention.
Specific embodiment
The embodiment of the present invention is described in further detail with reference to the accompanying drawings and examples.It is understood that this The described specific embodiment in place is used only for explaining the embodiment of the present invention, rather than the restriction to the embodiment of the present invention.In addition also It should be noted that for the ease of description, part rather than the entire infrastructure related to the embodiment of the present invention is illustrate only in accompanying drawing.
With reference to Fig. 1, Fig. 1 is that a kind of flow of method based on Solr data search provided in an embodiment of the present invention is illustrated Figure.
As shown in figure 1, the method based on Solr data search includes:
Step 101, from the bottom data storage storehouse of HBase or MongoDB, and according to the field generation of record rowkey;
Specifically, the returning result of search engine is ranked up by correlation, and relevant database can only be according to Row in table are returned.If that is, the limitation of non-correlation, can select to use internal memory type database synchronization relationship type number The lifting of query performance is realized according to storehouse, and search engine need not be used.Search engine is not the place of data storage, unless number According to inquiring about and showing that result is useful.So, search engine should not be used as database.
In view of as the internal memory type databases such as Memcache, Redis are to the dependence of internal memory, economically and greatly count Under the conditions of amount, start on the time loss of internal memory type database loading data, put aside internal memory type database as standby In selecting scheme.
All it is more suitable scheme from HBase or MongoDB.Meanwhile, in order to bottom is stored and search engine Decoupling, then it is necessary to have means not by way of Solr can still obtain data, this is also required in data database The situation that avoid unique key to depend on Solr to produce during the database design of storehouse completely occurs, especially for HBase.In order to keep away Exempt from full table scan, it is necessary to accurately record is obtained by unique rowkey, then the design of the rowkey is only use The key of Solr+HBase frameworks.Field according to deposited record obtains rowkey by the way of certain, rather than generation only One random value as rowkey and Solr unique key.
Step 102, optimizes in the optimization of internal memory to JVM, and memory configurations to Solr, disk take and affairs Daily record is optimized;
Wherein, JVM is Java Virtual Machine, and Solr is the application of a full-text search.
Specifically, due to the encapsulation of Solr bottoms be Lucene, and Lucene is to improve search efficiency, is adopted in index It is inverted index.And the invention of MapReduce is also to have benefited from inverted index.
The design of NoSQL is all denormalization (denormalized), so as to avoid in relevant database in order to Simplify storage and have to use a series of connection sentence in inquiry, it is a large amount of that although such design produces data storage Redundancy, but also obtain the raising of search efficiency simultaneously.This also provides and instructs in design, and either database sets Meter, or Solr index records design, all should be in units of record, rather than so-called table in relevant database (table) it is unit.
Preferably, it is described JVM is optimized in the optimization of internal memory, including:
The caching for presetting size is added after distributing to the internal memory that the Solr needs.
Specifically, JVM set main principle is that, be assigned to only the internal memory of Solr needs along with a point cache, To be taken a long time when avoiding gc.Because Solr clusters are to rely on ZooKeeper cooperative achievements, if occurred During Stop-The-World, ZooKeeper time-out can be impacted to ZooKeeper clusters.Especially using HBase conducts In the case that cluster is built in bottom storage, once there is Full GC overlong times in HBase clusters, it is possible to cause HBase's HMaster node lost contacts, worse situation is Stand by nodes also lost contact, such case occurs and means that whole bottom is deposited Storage is unavailable.So, carry out JVM tunings most important.
Preferably, the memory configurations to Solr are optimized, including:
Selection to the cache size, take-back strategy of the Solr is configured;
The caching includes that automatic preheating caching, filter caching, document caching, Query Result caching and/or thresholding are slow Deposit;
The take-back strategy is chosen for:Using FieldCache, the use of mergeFactor is reduced, make to be protected in index Few section is deposited, the compound file format using index is closed, and NIO is used from NIOFSDirectory when index is created, Direct internal memory is directly used, avoids generating segment from suitable section consolidation strategy.
Specifically, in the optimization of internal memory in addition to the optimization to JVM, it can be to adjust that Solr also has substantial amounts of configuration in itself It is whole, so that adapt to various production environments, such as the adjustment of cache size, the selection of take-back strategy etc., and configurable caching Include automatic preheating caching (autowarming), filter caching, document caching, Query Result caching, thresholding (Field again Value) caching etc..Suggestion caches (FieldCache) using domain, reduces the use of mergeFactor, make to preserve in index compared with Few section, closes the compound file format using index.Meanwhile, can select NIOFSDirectory (Solr when creating index To a realization of Directory interfaces in technology) using Non-Blocking I/O (NIO), directly use direct internal memory, it is to avoid seize JVM heap internal memory.The excessive segment of generation is avoided from suitable section consolidation strategy.
In order to avoid single schema data volumes reach the performance issue produced after certain magnitude, it may be considered that to this Schema is split, daily or monthly dynamic generation schema, again using the Alias of Collection by all phases during inquiry It is combined with the schema of data structure, so as to avoid the single schema of inquiry.Simultaneously, it is considered to by Replication's Quantity tunes up the purpose that can equally reach and improve response speed.But nor the quantity of cluster is the bigger the better, Solr is reason By above can be with infinite expanding, in the field, Solr still has its limitation.
Preferably, the disk to Solr takes and optimizes, including:
In the case of non-correlation use, limitation uses term vector (Term Vector);
When schema (refering in particular to the schema.xml configuration files of Solr) is designed, suitable document granularity is selected, set Selectively store domain;
If a record in the unique key location database for passing through Solr, fals is all set to by the attribute of stored;
For the attribute of not merit rating, omitNorms is set to true;
To date and numeric type, precision step-length precisionStep is reduced.
Specifically, to equally having optimizable place in the configuration of Solr, the either occupancy of EMS memory occupation or disk On.For example, in the case of non-correlation use, can limit using Term Vector, so as to reduce the occupancy of disk.If During meter schema, suitable document granularity is selected, storage domain can selectively be set, if mainly by Solr only A record in one key location database, can be all set to false, so as to reduce disk pressure by the attribute of stored. Do not prepare the attribute of merit rating for some, omitNorms can be set to true.For date and numeric type, Can be appropriate by precision step-length precisionStep set it is a little bit smaller.
Preferably, it is described that transaction journal is optimized, including:
The transaction journal is used to support that near real-time obtains data and atomic update;Make to write persistence and submit flow solution to Coupling;Support the copies synchronized of SolrCloud (Solr clusters) burst host node;For the length for balancing transaction journal and hard submission Frequency.
Specifically, transaction journal may insure to lose does not submit renewal to, main purpose has three:1st, it is near for supporting (NRT) obtains data and atomic update in real time;2nd, make to write persistence and submit flow decoupling to;3rd, SolrCloud bursts are supported The copies synchronized of host node.It is exactly to balance the length (how much not submitting renewal to) of transaction journal and carry firmly for transaction journal The frequency of friendship.If transaction journal is excessive, then restarting will spend the more long time to perform renewal.
Step 103, when data are inquired about, the index after being created according to Solr obtains corresponding number from bottom storage According to.
Specifically, Solr is not based on the safety of documentation level, and according to data selected storehouse, it is necessary to according to actual conditions Plus transaction controlling.
For HBase, there are the transaction frameworks such as Haeinsa, Tephra affairs can be added on HBase, and The mode of MongoDB presently the most popular addition transaction controlling is to simulate the control that affairs realize affairs using message queue. If Solr creates the transaction controlling of index in addition, Solr can be created the additions and deletions of index and database data as whole Body considers the addition of affairs.It is not king-sized situation in handling capacity, it is possible to use RabbitMQ simulates affairs, and in handling capacity In the case of very big, it is recommended to use Kafka, RabbitMQ at handling capacity and the quantity (TPS) of affairs/request per second aspect and Kafka does not have comparativity.But the original intention of Kafka designs processes daily record, can regard a log system, specific aim as It is very strong, so it does not possess the characteristic that a maturation message queue MQ should possess.And RabbitMQ is more ripe than Kafka, In availability, in stability, in reliability, RabbitMQ is more than Kafka.
A kind of method based on Solr data search provided in an embodiment of the present invention, from the bottom of HBase or MongoDB Layer data storage storehouse, and rowkey is generated according to the field of record;JVM is optimized in the optimization of internal memory, and to Solr Memory configurations, disk take and transaction journal optimize;When data are inquired about, the index after being created according to Solr is from described Corresponding data are obtained in bottom storage.So as to data pair can be realized by bottom storage and the design of data structure The weak dependence of Solr, by Dynamic data exchange out;Optimized and shema design lifting search performances by Solr, together When by JVM tunings lifted Solr clusters search efficiency and reduce Solr because occur Full GC cause Socket time-out can Can property;Transaction controlling is carried out to data creation index and data storage to database, it is ensured that the data of search and inquiry Data syn-chronization.
With reference to Fig. 2, Fig. 2 is that a kind of functional module of device based on Solr data search provided in an embodiment of the present invention is shown It is intended to.
As shown in Fig. 2 described device includes:
First acquisition module 201, for the bottom data storage storehouse from HBase or MongoDB, and according to record Field generates rowkey;
Optimization module 202, for being optimized to JVM in the optimization of internal memory, and memory configurations to Solr, disk are accounted for Optimized with transaction journal;
Second acquisition module 203, for when data are inquired about, the index after being created according to Solr to be from bottom storage Obtain corresponding data.
Preferably, the optimization module 202, specifically for:
The caching for presetting size is added after distributing to the internal memory that the Solr needs.
Preferably, the optimization module 202, also particularly useful for:
Selection to the cache size, take-back strategy of the Solr is configured;
The caching includes that automatic preheating caching, filter caching, document caching, Query Result caching and/or thresholding are slow Deposit;
The take-back strategy is chosen for:Using FieldCache, the use of mergeFactor is reduced, make to be protected in index Few section is deposited, the compound file format using index is closed, and NIO is used from NIOFSDirectory when index is created, Direct internal memory is directly used, avoids generating segment from suitable section consolidation strategy.
Preferably, the optimization module 202, also particularly useful for:
In the case of non-correlation use, limitation uses Term Vector;
When schema is designed, suitable document granularity is selected, selectively storage domain is set;
If a record in the unique key location database for passing through Solr, fals is all set to by the attribute of stored;
For the attribute of not merit rating, omitNorms is set to true;
To date and numeric type, precision step-length precisionStep is reduced.
Preferably, the optimization module 202, also particularly useful for:
The transaction journal is used to support that near real-time obtains data and atomic update;Make to write persistence and submit flow solution to Coupling;Support the copies synchronized of SolrCloud burst host nodes;Length and the hard frequency submitted to for balancing transaction journal.
A kind of device based on Solr data search provided in an embodiment of the present invention, from the bottom of HBase or MongoDB Layer data storage storehouse, and rowkey is generated according to the field of record;JVM is optimized in the optimization of internal memory, and to Solr Memory configurations, disk take and transaction journal optimize;When data are inquired about, the index after being created according to Solr is from described Corresponding data are obtained in bottom storage.So as to data pair can be realized by bottom storage and the design of data structure The weak dependence of Solr, by Dynamic data exchange out;Optimized and shema design lifting search performances by Solr, together When by JVM tunings lifted Solr clusters search efficiency and reduce Solr because occur Full GC cause Socket time-out can Can property;Transaction controlling is carried out to data creation index and data storage to database, it is ensured that the data of search and inquiry Data syn-chronization.
The know-why of the embodiment of the present invention is described above in association with specific embodiment.These descriptions are intended merely to explain this The principle of inventive embodiments, and can not by any way be construed to the limitation to embodiment of the present invention protection domain.Based on herein Explanation, those skilled in the art associated by would not require any inventive effort the embodiment of the present invention other are specific Implementation method, these modes are fallen within the protection domain of the embodiment of the present invention.

Claims (10)

1. a kind of method based on Solr data search, it is characterised in that methods described includes:
From the bottom data storage storehouse of HBase or MongoDB, and rowkey is generated according to the field of record;
JVM is optimized in the optimization of internal memory, and memory configurations to Solr, disk take and transaction journal carry out it is excellent Change;
When data are inquired about, the index after being created according to Solr obtains corresponding data from bottom storage.
2. method according to claim 1, it is characterised in that described to be optimized to JVM in the optimization of internal memory, bag Include:
The caching for presetting size is added after distributing to the internal memory that the Solr needs.
3. method according to claim 1, it is characterised in that the memory configurations to Solr are optimized, including:
Selection to the cache size, take-back strategy of the Solr is configured;
The caching includes automatic preheating caching, filter caching, document caching, Query Result caching and/or thresholding caching;
The take-back strategy is chosen for:Using FieldCache, the use of mergeFactor is reduced, make to preserve few in index Section, close using index compound file format, and create index when use NIO from NIOFSDirectory, directly Using direct internal memory, avoid generating segment from suitable section consolidation strategy.
4. method according to claim 1, it is characterised in that the disk to Solr takes and optimizes, including:
In the case of non-correlation use, limitation uses Term Vector;
When schema is designed, suitable document granularity is selected, selectively storage domain is set;
If a record in the unique key location database for passing through Solr, False is all set to by the attribute of stored;
For the attribute of not merit rating, omitNorms is set to true;
To date and numeric type, precision step-length precisionStep is reduced.
5. method according to claim 1, it is characterised in that described to be optimized to transaction journal, including:
The transaction journal is used to support that near real-time obtains data and atomic update;Make to write persistence and submit flow decoupling to; Support the copies synchronized of SolrCloud burst host nodes;Length and the hard frequency submitted to for balancing transaction journal.
6. a kind of device based on Solr data search, it is characterised in that described device includes:
First acquisition module, for the bottom data storage storehouse from HBase or MongoDB, and according to the field life of record Into rowkey;
Optimization module, for being optimized to JVM in the optimization of internal memory, and memory configurations to Solr, disk take and thing Business daily record is optimized;
Second acquisition module, for when data are inquired about, the index after being created according to Solr to obtain right from bottom storage The data answered.
7. device according to claim 6, it is characterised in that the optimization module, specifically for:
The caching for presetting size is added after distributing to the internal memory that the Solr needs.
8. device according to claim 6, it is characterised in that the optimization module, also particularly useful for:
Selection to the cache size, take-back strategy of the Solr is configured;
The caching includes automatic preheating caching, filter caching, document caching, Query Result caching and/or thresholding caching;
The take-back strategy is chosen for:Using FieldCache, the use of mergeFactor is reduced, make to preserve few in index Section, close using index compound file format, and create index when use NIO from NIOFSDirectory, directly Using direct internal memory, avoid generating segment from suitable section consolidation strategy.
9. device according to claim 6, it is characterised in that the optimization module, also particularly useful for:
In the case of non-correlation use, limitation uses Term Vector;
When schema is designed, suitable document granularity is selected, selectively storage domain is set;
If a record in the unique key location database for passing through Solr, fals is all set to by the attribute of stored;
For the attribute of not merit rating, omitNorms is set to true;
To date and numeric type, precision step-length precisionStep is reduced.
10. device according to claim 6, it is characterised in that the optimization module, also particularly useful for:
The transaction journal is used to support that near real-time obtains data and atomic update;Make to write persistence and submit flow decoupling to; Support the copies synchronized of SolrCloud burst host nodes;Length and the hard frequency submitted to for balancing transaction journal.
CN201611199422.1A 2016-12-22 2016-12-22 Method and device based on Solr data search Pending CN106682148A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201611199422.1A CN106682148A (en) 2016-12-22 2016-12-22 Method and device based on Solr data search

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201611199422.1A CN106682148A (en) 2016-12-22 2016-12-22 Method and device based on Solr data search

Publications (1)

Publication Number Publication Date
CN106682148A true CN106682148A (en) 2017-05-17

Family

ID=58870241

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201611199422.1A Pending CN106682148A (en) 2016-12-22 2016-12-22 Method and device based on Solr data search

Country Status (1)

Country Link
CN (1) CN106682148A (en)

Cited By (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107818126A (en) * 2017-09-01 2018-03-20 广州慧睿思通信息科技有限公司 A kind of full text information retrieval method towards Mongo databases
CN107844579A (en) * 2017-11-10 2018-03-27 顺丰科技有限公司 Optimize method, system and the equipment for the access of distributed data base middleware
CN108717374A (en) * 2018-04-24 2018-10-30 阿里巴巴集团控股有限公司 The method, apparatus and computer equipment that Java Virtual Machine preheats when starting
CN110008272A (en) * 2019-04-10 2019-07-12 张绿儿 The NoSQL database evaluating system and its construction method of facing sensing device data
CN110020063A (en) * 2017-07-18 2019-07-16 北京京东尚科信息技术有限公司 Method for vertical search and system
CN110232106A (en) * 2019-04-26 2019-09-13 安徽四创电子股份有限公司 A kind of mass data storage and method for quickly retrieving based on MongoDB and Solr
CN111400779A (en) * 2020-01-07 2020-07-10 李蕴光 High-dimensional data encryption method and system
CN112069211A (en) * 2020-08-21 2020-12-11 苏州浪潮智能科技有限公司 Cache preheating optimization method and device based on Solr
CN112231531A (en) * 2020-09-15 2021-01-15 山东浪潮通软信息科技有限公司 Data display method, equipment and medium based on openstb
CN112235332A (en) * 2019-07-15 2021-01-15 北京京东尚科信息技术有限公司 Read-write switching method and device for cluster

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104102710A (en) * 2014-07-15 2014-10-15 浪潮(北京)电子信息产业有限公司 Massive data query method
CN104503985A (en) * 2014-12-03 2015-04-08 浪潮电子信息产业股份有限公司 Method for automatically creating Solr index file by Hbase data
CN105138592A (en) * 2015-07-31 2015-12-09 武汉虹信技术服务有限责任公司 Distributed framework-based log data storing and retrieving method

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104102710A (en) * 2014-07-15 2014-10-15 浪潮(北京)电子信息产业有限公司 Massive data query method
CN104503985A (en) * 2014-12-03 2015-04-08 浪潮电子信息产业股份有限公司 Method for automatically creating Solr index file by Hbase data
CN105138592A (en) * 2015-07-31 2015-12-09 武汉虹信技术服务有限责任公司 Distributed framework-based log data storing and retrieving method

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
MYQ52618004: ""solr hbase 大数据"", 《CSDN—HTTPS://BLOG.CSDN.NET/MYQ526180048/ARTICLE/DETAILS/84413801》 *

Cited By (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110020063A (en) * 2017-07-18 2019-07-16 北京京东尚科信息技术有限公司 Method for vertical search and system
CN107818126A (en) * 2017-09-01 2018-03-20 广州慧睿思通信息科技有限公司 A kind of full text information retrieval method towards Mongo databases
CN107844579A (en) * 2017-11-10 2018-03-27 顺丰科技有限公司 Optimize method, system and the equipment for the access of distributed data base middleware
CN107844579B (en) * 2017-11-10 2021-10-26 顺丰科技有限公司 Method, system and equipment for optimizing distributed database middleware access
CN108717374A (en) * 2018-04-24 2018-10-30 阿里巴巴集团控股有限公司 The method, apparatus and computer equipment that Java Virtual Machine preheats when starting
CN108717374B (en) * 2018-04-24 2021-08-17 创新先进技术有限公司 Method and device for preheating during starting of Java virtual machine and computer equipment
CN110008272B (en) * 2019-04-10 2020-01-31 张绿儿 NoSQL database evaluation system for sensor data and construction method thereof
CN110008272A (en) * 2019-04-10 2019-07-12 张绿儿 The NoSQL database evaluating system and its construction method of facing sensing device data
CN110232106A (en) * 2019-04-26 2019-09-13 安徽四创电子股份有限公司 A kind of mass data storage and method for quickly retrieving based on MongoDB and Solr
CN112235332A (en) * 2019-07-15 2021-01-15 北京京东尚科信息技术有限公司 Read-write switching method and device for cluster
CN111400779A (en) * 2020-01-07 2020-07-10 李蕴光 High-dimensional data encryption method and system
CN112069211A (en) * 2020-08-21 2020-12-11 苏州浪潮智能科技有限公司 Cache preheating optimization method and device based on Solr
CN112069211B (en) * 2020-08-21 2022-11-22 苏州浪潮智能科技有限公司 Cache preheating optimization method and device based on Solr
CN112231531A (en) * 2020-09-15 2021-01-15 山东浪潮通软信息科技有限公司 Data display method, equipment and medium based on openstb

Similar Documents

Publication Publication Date Title
CN106682148A (en) Method and device based on Solr data search
CN104252536B (en) A kind of internet log data query method and device based on hbase
Makris et al. A classification of NoSQL data stores based on key design characteristics
EP3121739B1 (en) Method for performing transactions on data and a transactional database
US20100161565A1 (en) Cluster data management system and method for data restoration using shared redo log in cluster data management system
Băzăr et al. The Transition from RDBMS to NoSQL. A Comparative Analysis of Three Popular Non-Relational Solutions: Cassandra, MongoDB and Couchbase.
CN101840400B (en) Multilevel classification retrieval method and system
CN107038207A (en) A kind of data query method, data processing method and device
CN103020315A (en) Method for storing mass of small files on basis of master-slave distributed file system
CN101566986A (en) Method and device for processing data in online business processing
WO2013155752A1 (en) Database and hadoop hybrid platform-oriented olap query processing method
CN103440288A (en) Big data storage method and device
CN102521406A (en) Distributed query method and system for complex task of querying massive structured data
CN104133867A (en) DOT in-fragment secondary index method and DOT in-fragment secondary index system
CN100458784C (en) Researching system and method used in digital labrary
CN102521405A (en) Massive structured data storage and query methods and systems supporting high-speed loading
US20080114733A1 (en) User-structured data table indexing
KR20110014987A (en) Managing storage of individually accessible data units
CN108509437A (en) A kind of ElasticSearch inquiries accelerated method
US20140379631A1 (en) Transactional key-value database with searchable indexes
CN109542907A (en) Database caches construction method, device, computer equipment and storage medium
CN105095520A (en) Distributed type in-memory database indexing method oriented to structural data
CN103646051A (en) Big-data parallel processing system and method based on column storage
CN104536908B (en) A kind of magnanimity small records efficient storage management method towards unit
CN102779138A (en) Hard disk access method of real time data

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20170517