CN106227788A - Database query method based on Lucene - Google Patents
Database query method based on Lucene Download PDFInfo
- Publication number
- CN106227788A CN106227788A CN201610571519.4A CN201610571519A CN106227788A CN 106227788 A CN106227788 A CN 106227788A CN 201610571519 A CN201610571519 A CN 201610571519A CN 106227788 A CN106227788 A CN 106227788A
- Authority
- CN
- China
- Prior art keywords
- index
- lucene
- data base
- search
- inquiry
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 38
- 230000008569 process Effects 0.000 claims description 11
- 238000012545 processing Methods 0.000 claims description 8
- 238000001914 filtration Methods 0.000 claims description 3
- 230000011218 segmentation Effects 0.000 claims description 3
- 238000000605 extraction Methods 0.000 claims 1
- 230000008859 change Effects 0.000 description 3
- 230000008901 benefit Effects 0.000 description 2
- 238000010586 diagram Methods 0.000 description 2
- 230000000694 effects Effects 0.000 description 2
- 239000000284 extract Substances 0.000 description 2
- 230000006870 function Effects 0.000 description 2
- 238000011160 research Methods 0.000 description 2
- 238000011161 development Methods 0.000 description 1
- 230000018109 developmental process Effects 0.000 description 1
- 235000013399 edible fruits Nutrition 0.000 description 1
- 238000002474 experimental method Methods 0.000 description 1
- 238000012423 maintenance Methods 0.000 description 1
- 239000000203 mixture Substances 0.000 description 1
- 230000008520 organization Effects 0.000 description 1
- 230000009467 reduction Effects 0.000 description 1
- 230000004044 response Effects 0.000 description 1
- 238000012360 testing method Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/24—Querying
- G06F16/245—Query processing
- G06F16/2453—Query optimisation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/22—Indexing; Data structures therefor; Storage structures
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/24—Querying
- G06F16/242—Query formulation
- G06F16/2425—Iterative querying; Query formulation based on the results of a preceding query
Abstract
The invention discloses a database query method based on Lucene, which refers to a hierarchical mode of Lucene indexes, firstly creates indexes for resources in a database, and adds the indexes into an index resource library; utilizing the retrieval conditions to query from the index resource library to obtain a retrieval result and return the retrieval result; the method mainly comprises the steps of creating an index part and searching the index part. The invention can obviously improve the speed and recall rate of database query and can ensure that a querier obtains more friendly retrieval experience.
Description
Technical field
The present invention relates to data base querying analytical technology, a kind of data base querying based on Lucene
Method.
Background technology
Internet technology is progressively increasing in made rapid progress, the information resources in the whole world at present, and all trades and professions are all towards letter
Breathization changes, it is therefore desirable to being stored in the significant data on computer increases day by day, and the data base of various websites also becomes more
Come the biggest, such as ecommerce class website, there are the data of magnanimity especially.The conventional data base querying only facing site
Way has met with content that an a lot of difficult problem, the such as search speed response time very slow, to user searches the most for a long time and inquiry's
The actual wishes goodness of fit is the highest, and the content of search result is sufficiently complete, and it cannot be carried out according to the search wish of inquiry
Sequence, it is impossible to enough improve user and search satisfaction.
Information search means the interior lookup of all sequences of information resources, seeks out those interior with what inquiry's original idea matched
Holding, in information search way, full-text search is very useful, and it is that versatility is best in all search ways.In full
The search condition that search can be held inquiry and be provided goes to contrast with all of word in document, and data base querying
Field comparison is different, and the benefit of text search tool is that search area is wide and thorough, it is possible to the most complete to inquiry
Retrieval result.And, related index terms comparison inside the term and index database that inquiry is provided is understood in full-text search, and
The sequential hunting contrast of data base querying, it to have the lifting of multiple order of magnitude in terms of efficiency.
Lucene is the full-text search engine tool kit of an open source code, but it is not a complete full-text search
Engine, but the framework of a full-text search engine, it is provided that complete query engine and index engine, part text analyzing is drawn
Hold up.Lucene is a set of for full-text search with the library of increasing income of search, for software developer provide one easy to use
Tool kit, to realize easily the function of full-text search in goal systems, or set up complete based on this
Full-text search engine.The concrete form of Lucene index is that itself is independent, and it and concrete use platform do not have responsibility.Lucene
Basic representation unit be 8 bytes, if system is compatible, then they can utilize identical index resource.
Summary of the invention
The present invention is directed to demand and the weak point of current technology development, it is provided that a kind of data base based on Lucene
Querying method.
A kind of data base query method based on Lucene of the present invention, solves the skill that above-mentioned technical problem uses
Art scheme is as follows: described a kind of data base query method based on Lucene, with reference to the hierarchical schema of Lucene index, first
It is first asset creation index in data base, and index is added in index resources bank;Utilize search condition in index resources bank
Inquiry, it is thus achieved that retrieval result also returns;Its key step includes creating index part, search index part.
Preferably, described establishment index part refers to, gathers resource regularly, and carry out for these resources in data base
Suitable analysis and process, then index those asset creation and they added in index resources bank.
Preferably, described establishment index part specifically includes following steps:
1) first obtain information resources, i.e. go to obtain the resource of charting in data base with the set time, as creating index
The source of data of file;
2) then carry out information resources filtration, in certain records, select field to be stored to make maintaining information integrity
On the premise of avoid useless resource content;
3) secondly, analyze the information content filtered out, carry out word segmentation processing;
4) followed by create index, recorded content is loaded in resources bank, index is created in the word above divided, index energy
Enough leave in inside hard disk or internal memory;Finally index file is put into inside index resources bank.
Preferably, described search index part main contents include, the search condition utilizing inquiry to be provided goes to obtain
Query statement, then analyzes and processes these query statements, and inquiry in index resources bank, returns final retrieval result afterwards
To inquiry.
Preferably, the search condition that inquiry provides, the secondly syntax of the conditional statement obtained by analyzing and processing are first obtained
Grammatical structure, extracts corresponding key word, constitutes syntax tree according to certain rule, then goes to find by search index and meets
The data-base recording of syntax.
It is useful that a kind of data base query method based on Lucene of the present invention compared with prior art has
Effect is: use the present invention, when inquiry provides condition to contain several words, and the present invention can be syncopated as these words,
Then come comparison index database, and the database information of energy comparison change order via Term, can inquire in related record
Hold and present to inquiry, the recall rate of data base querying can be significantly improved;
When repeatedly inquiring about certain term, the present invention can be loaded into the result of first inquiry in the caching of computer
Face, during so one query instantly is to same retrieval word, finds corresponding information in directly removing Computer Cache, and need not
To index resources bank repeated retrieval, search inquiry efficiency can be significantly improved;
Word frequency position can be gone to weight sort algorithm, its search result energy according to the word frequency attribute in index chained list
Enough go to make sorting operation according to word match degree, draw the record information of more exchange premium inquiry wish, improve inquiry's
Search quality, it is possible to allow inquiry obtain the most friendly retrieval and experience, enhance user and use the dependency of system.
Figure of description
Accompanying drawing 1 is the flow chart of described data base query method based on Lucene;
Accompanying drawing 2 is the hierarchical structure schematic diagram of described Lucene index.
Detailed description of the invention
For making the object, technical solutions and advantages of the present invention clearer, below in conjunction with specific embodiment, to this
Bright described a kind of data base query method based on Lucene further describes.
The present invention is with reference to the hierarchical schema of full-text query instrument Lucene, it is proposed that a kind of data based on Lucene
Library inquiry method, constructs the data base querying extended method based on Lucene, through analysis of experiments, this query expansion
Method can significantly improve speed and the recall rate of data base querying, it is possible to allows inquiry obtain the most friendly retrieval body
Test.
Embodiment:
Data base query method based on Lucene described in the present embodiment, with reference to the level mould of full-text query instrument Lucene
Formula, is first asset creation index in data base, and is added by index in index resources bank;Utilize search condition from indexing resource
Inquire about in storehouse, it is thus achieved that retrieval result also returns;Its key step includes creating index part, search index part.
Lucene does not associate with word, document form about the interface of analyzing and processing document, when needs create index,
Index instrument need only be allowed to process data stream can be realized as.Described establishment index part main contents include, regularly from data
Gather resource in storehouse, and carry out suitable analysis and process for these resources, then to those asset creation index and by them
Add in index resources bank.
Lucene itself has the research tool of a sleeve forming, and inquiry can go self according to this research tool
Search Requirement.Inquiry is capable of setting up by demand specifically searching regulation, such as fuzzy search, range retrieval etc..Institute
Stating search index part main contents to include, the search condition utilizing inquiry to be provided goes to obtain query statement, then analyzes
Processing these query statements, inquiry in index resources bank, returns to inquiry by final retrieval result afterwards.
Accompanying drawing 1 is the flow chart of described data base query method based on Lucene, and as shown in Figure 1, using should
The detailed process of data base query method is as follows: gathers resource information the most regularly in data base, and enters for these resources
They then to those asset creation index module, and are added index resources bank (index database) by the suitable analysis of row and process
In;Then obtain the querying condition that user is provided, then pass through analysis querying condition module analysis and process these querying conditions,
Afterwards by the inquiry in indexing resources bank (index database) of search index module, present Query Result, and by final retrieval knot
Fruit returns to user.
The hierarchical structure schematic diagram that accompanying drawing 2 indexes for Lucene, as shown in Figure 2, the index of Lucene is divided into
Under several levels, be first Index(index) layer, be then Segment(section) layer, next to that Document(document) layer,
Bottom is Field(field) layer.Wherein, Index is made up of some Segment, and a section comprises the document of many, and each is civilian
Shelves then include the field of many, and finally, the composition part of field is (Term) lemma one by one.
Owing to Lucene index is according to certain structure organization, can be at once in index money when therefore going to scan for
Find in source, and the search of execution sequence in the resource before need not going, it is possible to a lot of for the area reduction of retrieval, greatly
Improve recall precision.The Data Source of Lucene is not a kind of definite form, the level of a kind of file, inquiry
Person goes the data source creating index can be able to be xml document, character string, txt document for various forms, or data base
Interior data resource.
Below for the specific implementation process of data base query method based on Lucene described in the present embodiment, come with this
Understand technology contents and the technique effect of this data base query method the most in detail.
First create the index of data, specifically include following steps:
1) first obtain information resources, i.e. go to obtain the resource of charting in data base with the set time, as creating index
The source of data of file;
2) then carry out information resources filtration, in certain records, how to select field to be stored to make in maintenance information complete
Avoid useless resource content on the premise of whole degree, give an example, for college student, typically have deposit value be
The field such as " student number ", " being learned specialty ", " learned lesson ", " total marks of the examination ", and for the quiz carried out once in a while on classroom,
The information such as mock examination substantially need not be deposited;
3) secondly, analyzing the information content filtered out, carry out word segmentation processing, the most the most frequently used participle instrument is with search dog word
Mmseg4j segmenter based on storehouse;
4) followed by create index, it should be loaded in resources bank by recorded content, index, rope is created in the word above divided
Draw and can leave in inside hard disk or internal memory;Finally index file is put into inside index resources bank.
After creating index, scanning for index, basis inquiry based on this, detailed process is as follows: first obtain inquiry
The search condition that person provides, secondly the syntax grammatical structure of the conditional statement obtained by analyzing and processing, extracts corresponding key word,
Constitute syntax tree according to certain rule, then remove to find the data-base recording meeting syntax by search index.
Citing syntax tree " key1andkey2notkey3 " carrys out illustratively searching step:
(1) seek each containing the record of key1, key2, key3 within first, removing data base's table of falling row chain;Then, by those
Data-base recording chained list containing key1 and key2 combines, and can get the chained list simultaneously containing key1 and key2;
(2) then the chained list obtained and the record containing key3 are performed difference operation, remove those chained lists containing key3,
Only containing only, to get, the record that key1, key2 do not contain key3, this record is final qualified chained list.
Data base query method based on Lucene described in the present embodiment, at the inverted structure of conventional query facility
On the basis of, Lucene adds self piecemeal and creates the function of index file, it is possible to for the index that new asset creation is little, in order to
Increase search efficiency, through merging with the index existed, it is possible to improve and optimizate each index resource.When to certain term
When repeatedly inquiring about, the present invention is loaded into the result of first inquiry inside the caching of computer, so one query instantly is to same
During one retrieval word, in directly removing Computer Cache, find corresponding information, and need not to index resources bank repeated retrieval, because of
This search speed of the present invention can improve several times.When inquiry provides condition to contain several words, and the present invention can be syncopated as
These words, then come comparison index database, and the database information of energy comparison change order via Term, can inquire relevant
System recorded content and present to user, database enquiry expanding method has the highest recall rate.
Above-mentioned detailed description of the invention is only the concrete case of the present invention, and the scope of patent protection of the present invention includes but not limited to
Above-mentioned detailed description of the invention, any that meet claims of the present invention and any person of an ordinary skill in the technical field
The suitably change being done it or replacement, all should fall into the scope of patent protection of the present invention.
Claims (5)
1. the data base query method based on Lucene, it is characterised in that with reference to the level mould of Lucene index
Formula, is first asset creation index in data base, and is added by index in index resources bank;Utilize search condition from indexing resource
Inquire about in storehouse, it is thus achieved that retrieval result also returns;Its key step includes creating index part, search index part.
A kind of data base query method based on Lucene, it is characterised in that described wound
Index part to refer to, in data base, gather resource regularly, and carry out suitable analysis and process for these resources, then
Those asset creation are indexed and they is added in index resources bank.
A kind of data base query method based on Lucene, it is characterised in that described wound
Index and partly specifically include following steps:
1) first obtain information resources, i.e. go to obtain the resource of charting in data base with the set time, as creating index
The source of data of file;
2) then carry out information resources filtration, in certain records, select field to be stored to make maintaining information integrity
On the premise of avoid useless resource content;
3) secondly, analyze the information content filtered out, carry out word segmentation processing;
4) followed by create index, recorded content is loaded in resources bank, index is created in the word above divided, index energy
Enough leave in inside hard disk or internal memory;Finally index file is put into inside index resources bank.
A kind of data base query method based on Lucene, it is characterised in that described in search
Seeking index part main contents to include, the search condition utilizing inquiry to be provided goes to obtain query statement, then analyzes and processes
These query statements, inquiry in index resources bank, returns to inquiry by final retrieval result afterwards.
A kind of data base query method based on Lucene, it is characterised in that first obtain
The search condition that inquiry provides, secondly the syntax grammatical structure of the conditional statement obtained by analyzing and processing, extraction is corresponding closes
Keyword, constitutes syntax tree according to certain rule, then removes to find the data-base recording meeting syntax by search index.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201610571519.4A CN106227788A (en) | 2016-07-20 | 2016-07-20 | Database query method based on Lucene |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201610571519.4A CN106227788A (en) | 2016-07-20 | 2016-07-20 | Database query method based on Lucene |
Publications (1)
Publication Number | Publication Date |
---|---|
CN106227788A true CN106227788A (en) | 2016-12-14 |
Family
ID=57531849
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201610571519.4A Pending CN106227788A (en) | 2016-07-20 | 2016-07-20 | Database query method based on Lucene |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN106227788A (en) |
Cited By (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106874429A (en) * | 2017-01-23 | 2017-06-20 | 南威软件股份有限公司 | A kind of method that stsndard SQL is converted into full-text search standard queries |
CN107656985A (en) * | 2017-09-11 | 2018-02-02 | 北京京东尚科信息技术有限公司 | Web page interrogation method and its system |
CN107729518A (en) * | 2017-10-26 | 2018-02-23 | 山东浪潮云服务信息科技有限公司 | The text searching method and device of a kind of relevant database |
CN107818126A (en) * | 2017-09-01 | 2018-03-20 | 广州慧睿思通信息科技有限公司 | A kind of full text information retrieval method towards Mongo databases |
CN109002444A (en) * | 2017-06-07 | 2018-12-14 | 北大方正集团有限公司 | Text searching method and full-text search device |
CN109241098A (en) * | 2018-08-08 | 2019-01-18 | 南京中新赛克科技有限责任公司 | A kind of enquiring and optimizing method of distributed data base |
CN111143349A (en) * | 2019-11-26 | 2020-05-12 | 广东三扬网络科技有限公司 | Method for quickly searching information from set, electronic equipment and storage medium |
CN111488379A (en) * | 2020-04-17 | 2020-08-04 | 焦点科技股份有限公司 | Method for optimizing Hbase large data query |
CN112115361A (en) * | 2020-09-17 | 2020-12-22 | 浪潮卓数大数据产业发展有限公司 | Data retrieval optimization method and system based on elastic search |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101030217A (en) * | 2007-03-22 | 2007-09-05 | 华中科技大学 | Method for indexing and acquiring semantic net information |
CN101131704A (en) * | 2006-08-23 | 2008-02-27 | 国际商业机器公司 | Device and method for positional representation of content |
CN101373532A (en) * | 2008-07-10 | 2009-02-25 | 昆明理工大学 | FAQ Chinese request-answering system implementing method in tourism field |
US20090112830A1 (en) * | 2007-10-25 | 2009-04-30 | Fuji Xerox Co., Ltd. | System and methods for searching images in presentations |
CN105550236A (en) * | 2015-11-27 | 2016-05-04 | 广州华多网络科技有限公司 | Distributed data deduplication processing method and apparatus |
-
2016
- 2016-07-20 CN CN201610571519.4A patent/CN106227788A/en active Pending
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101131704A (en) * | 2006-08-23 | 2008-02-27 | 国际商业机器公司 | Device and method for positional representation of content |
CN101030217A (en) * | 2007-03-22 | 2007-09-05 | 华中科技大学 | Method for indexing and acquiring semantic net information |
US20090112830A1 (en) * | 2007-10-25 | 2009-04-30 | Fuji Xerox Co., Ltd. | System and methods for searching images in presentations |
CN101373532A (en) * | 2008-07-10 | 2009-02-25 | 昆明理工大学 | FAQ Chinese request-answering system implementing method in tourism field |
CN105550236A (en) * | 2015-11-27 | 2016-05-04 | 广州华多网络科技有限公司 | Distributed data deduplication processing method and apparatus |
Cited By (13)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106874429B (en) * | 2017-01-23 | 2020-08-11 | 南威软件股份有限公司 | Method for converting standard SQL into full text retrieval standard query |
CN106874429A (en) * | 2017-01-23 | 2017-06-20 | 南威软件股份有限公司 | A kind of method that stsndard SQL is converted into full-text search standard queries |
CN109002444A (en) * | 2017-06-07 | 2018-12-14 | 北大方正集团有限公司 | Text searching method and full-text search device |
CN107818126A (en) * | 2017-09-01 | 2018-03-20 | 广州慧睿思通信息科技有限公司 | A kind of full text information retrieval method towards Mongo databases |
CN107656985A (en) * | 2017-09-11 | 2018-02-02 | 北京京东尚科信息技术有限公司 | Web page interrogation method and its system |
CN107729518A (en) * | 2017-10-26 | 2018-02-23 | 山东浪潮云服务信息科技有限公司 | The text searching method and device of a kind of relevant database |
CN109241098A (en) * | 2018-08-08 | 2019-01-18 | 南京中新赛克科技有限责任公司 | A kind of enquiring and optimizing method of distributed data base |
CN109241098B (en) * | 2018-08-08 | 2022-02-18 | 南京中新赛克科技有限责任公司 | Query optimization method for distributed database |
CN111143349A (en) * | 2019-11-26 | 2020-05-12 | 广东三扬网络科技有限公司 | Method for quickly searching information from set, electronic equipment and storage medium |
CN111488379A (en) * | 2020-04-17 | 2020-08-04 | 焦点科技股份有限公司 | Method for optimizing Hbase large data query |
CN111488379B (en) * | 2020-04-17 | 2022-07-19 | 焦点科技股份有限公司 | Method for optimizing Hbase large data query |
CN112115361A (en) * | 2020-09-17 | 2020-12-22 | 浪潮卓数大数据产业发展有限公司 | Data retrieval optimization method and system based on elastic search |
CN112115361B (en) * | 2020-09-17 | 2022-07-05 | 浪潮卓数大数据产业发展有限公司 | Data retrieval optimization method and system based on elastic search |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN106227788A (en) | Database query method based on Lucene | |
Brickley et al. | Google Dataset Search: Building a search engine for datasets in an open Web ecosystem | |
CN109800284B (en) | Task-oriented unstructured information intelligent question-answering system construction method | |
US8756245B2 (en) | Systems and methods for answering user questions | |
CN104239513B (en) | A kind of semantic retrieving method of domain-oriented data | |
US8990236B2 (en) | Method, computer product program and system for analysis of data | |
CN103177075B (en) | The detection of Knowledge based engineering entity and disambiguation | |
CN103064956B (en) | For searching for the method for digital content, calculating system and computer-readable medium | |
CN103678576B (en) | The text retrieval system analyzed based on dynamic semantics | |
CN108846029B (en) | Information correlation analysis method based on knowledge graph | |
CN105447080B (en) | A kind of inquiry complementing method in community's question and answer search | |
JP2006048686A (en) | Generation method for document explanation based on phrase | |
JP2006048685A (en) | Indexing method based on phrase in information retrieval system | |
JP2006048684A (en) | Retrieval method based on phrase in information retrieval system | |
JP2006048683A (en) | Phrase identification method in information retrieval system | |
CN106547893A (en) | A kind of photo sort management system and photo sort management method | |
CN102819600B (en) | Keyword search methodology towards relational database of power production management system | |
CN108763573A (en) | A kind of OLAP engines method for routing and system based on machine learning | |
CN111061828B (en) | Digital library knowledge retrieval method and device | |
Barrio et al. | Sampling strategies for information extraction over the deep web | |
CN115422155A (en) | Modeling method of data lake metadata model | |
Döhmen et al. | Gitschemas: A dataset for automating relational data preparation tasks | |
CN109460467B (en) | Method for constructing network information classification system | |
Huang et al. | Design a batched information retrieval system based on a concept-lattice-like structure | |
Pokorný et al. | Graph pattern index for Neo4j graph databases |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20161214 |
|
RJ01 | Rejection of invention patent application after publication |