CN106227788A - Database query method based on Lucene - Google Patents

Database query method based on Lucene Download PDF

Info

Publication number
CN106227788A
CN106227788A CN201610571519.4A CN201610571519A CN106227788A CN 106227788 A CN106227788 A CN 106227788A CN 201610571519 A CN201610571519 A CN 201610571519A CN 106227788 A CN106227788 A CN 106227788A
Authority
CN
China
Prior art keywords
index
lucene
data base
search
inquiry
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201610571519.4A
Other languages
Chinese (zh)
Inventor
胡光阳
杨培强
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Inspur Software Group Co Ltd
Original Assignee
Inspur Software Group Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Inspur Software Group Co Ltd filed Critical Inspur Software Group Co Ltd
Priority to CN201610571519.4A priority Critical patent/CN106227788A/en
Publication of CN106227788A publication Critical patent/CN106227788A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2453Query optimisation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/242Query formulation
    • G06F16/2425Iterative querying; Query formulation based on the results of a preceding query

Abstract

The invention discloses a database query method based on Lucene, which refers to a hierarchical mode of Lucene indexes, firstly creates indexes for resources in a database, and adds the indexes into an index resource library; utilizing the retrieval conditions to query from the index resource library to obtain a retrieval result and return the retrieval result; the method mainly comprises the steps of creating an index part and searching the index part. The invention can obviously improve the speed and recall rate of database query and can ensure that a querier obtains more friendly retrieval experience.

Description

A kind of data base query method based on Lucene
Technical field
The present invention relates to data base querying analytical technology, a kind of data base querying based on Lucene Method.
Background technology
Internet technology is progressively increasing in made rapid progress, the information resources in the whole world at present, and all trades and professions are all towards letter Breathization changes, it is therefore desirable to being stored in the significant data on computer increases day by day, and the data base of various websites also becomes more Come the biggest, such as ecommerce class website, there are the data of magnanimity especially.The conventional data base querying only facing site Way has met with content that an a lot of difficult problem, the such as search speed response time very slow, to user searches the most for a long time and inquiry's The actual wishes goodness of fit is the highest, and the content of search result is sufficiently complete, and it cannot be carried out according to the search wish of inquiry Sequence, it is impossible to enough improve user and search satisfaction.
Information search means the interior lookup of all sequences of information resources, seeks out those interior with what inquiry's original idea matched Holding, in information search way, full-text search is very useful, and it is that versatility is best in all search ways.In full The search condition that search can be held inquiry and be provided goes to contrast with all of word in document, and data base querying Field comparison is different, and the benefit of text search tool is that search area is wide and thorough, it is possible to the most complete to inquiry Retrieval result.And, related index terms comparison inside the term and index database that inquiry is provided is understood in full-text search, and The sequential hunting contrast of data base querying, it to have the lifting of multiple order of magnitude in terms of efficiency.
Lucene is the full-text search engine tool kit of an open source code, but it is not a complete full-text search Engine, but the framework of a full-text search engine, it is provided that complete query engine and index engine, part text analyzing is drawn Hold up.Lucene is a set of for full-text search with the library of increasing income of search, for software developer provide one easy to use Tool kit, to realize easily the function of full-text search in goal systems, or set up complete based on this Full-text search engine.The concrete form of Lucene index is that itself is independent, and it and concrete use platform do not have responsibility.Lucene Basic representation unit be 8 bytes, if system is compatible, then they can utilize identical index resource.
Summary of the invention
The present invention is directed to demand and the weak point of current technology development, it is provided that a kind of data base based on Lucene Querying method.
A kind of data base query method based on Lucene of the present invention, solves the skill that above-mentioned technical problem uses Art scheme is as follows: described a kind of data base query method based on Lucene, with reference to the hierarchical schema of Lucene index, first It is first asset creation index in data base, and index is added in index resources bank;Utilize search condition in index resources bank Inquiry, it is thus achieved that retrieval result also returns;Its key step includes creating index part, search index part.
Preferably, described establishment index part refers to, gathers resource regularly, and carry out for these resources in data base Suitable analysis and process, then index those asset creation and they added in index resources bank.
Preferably, described establishment index part specifically includes following steps:
1) first obtain information resources, i.e. go to obtain the resource of charting in data base with the set time, as creating index The source of data of file;
2) then carry out information resources filtration, in certain records, select field to be stored to make maintaining information integrity On the premise of avoid useless resource content;
3) secondly, analyze the information content filtered out, carry out word segmentation processing;
4) followed by create index, recorded content is loaded in resources bank, index is created in the word above divided, index energy Enough leave in inside hard disk or internal memory;Finally index file is put into inside index resources bank.
Preferably, described search index part main contents include, the search condition utilizing inquiry to be provided goes to obtain Query statement, then analyzes and processes these query statements, and inquiry in index resources bank, returns final retrieval result afterwards To inquiry.
Preferably, the search condition that inquiry provides, the secondly syntax of the conditional statement obtained by analyzing and processing are first obtained Grammatical structure, extracts corresponding key word, constitutes syntax tree according to certain rule, then goes to find by search index and meets The data-base recording of syntax.
It is useful that a kind of data base query method based on Lucene of the present invention compared with prior art has Effect is: use the present invention, when inquiry provides condition to contain several words, and the present invention can be syncopated as these words, Then come comparison index database, and the database information of energy comparison change order via Term, can inquire in related record Hold and present to inquiry, the recall rate of data base querying can be significantly improved;
When repeatedly inquiring about certain term, the present invention can be loaded into the result of first inquiry in the caching of computer Face, during so one query instantly is to same retrieval word, finds corresponding information in directly removing Computer Cache, and need not To index resources bank repeated retrieval, search inquiry efficiency can be significantly improved;
Word frequency position can be gone to weight sort algorithm, its search result energy according to the word frequency attribute in index chained list Enough go to make sorting operation according to word match degree, draw the record information of more exchange premium inquiry wish, improve inquiry's Search quality, it is possible to allow inquiry obtain the most friendly retrieval and experience, enhance user and use the dependency of system.
Figure of description
Accompanying drawing 1 is the flow chart of described data base query method based on Lucene;
Accompanying drawing 2 is the hierarchical structure schematic diagram of described Lucene index.
Detailed description of the invention
For making the object, technical solutions and advantages of the present invention clearer, below in conjunction with specific embodiment, to this Bright described a kind of data base query method based on Lucene further describes.
The present invention is with reference to the hierarchical schema of full-text query instrument Lucene, it is proposed that a kind of data based on Lucene Library inquiry method, constructs the data base querying extended method based on Lucene, through analysis of experiments, this query expansion Method can significantly improve speed and the recall rate of data base querying, it is possible to allows inquiry obtain the most friendly retrieval body Test.
Embodiment:
Data base query method based on Lucene described in the present embodiment, with reference to the level mould of full-text query instrument Lucene Formula, is first asset creation index in data base, and is added by index in index resources bank;Utilize search condition from indexing resource Inquire about in storehouse, it is thus achieved that retrieval result also returns;Its key step includes creating index part, search index part.
Lucene does not associate with word, document form about the interface of analyzing and processing document, when needs create index, Index instrument need only be allowed to process data stream can be realized as.Described establishment index part main contents include, regularly from data Gather resource in storehouse, and carry out suitable analysis and process for these resources, then to those asset creation index and by them Add in index resources bank.
Lucene itself has the research tool of a sleeve forming, and inquiry can go self according to this research tool Search Requirement.Inquiry is capable of setting up by demand specifically searching regulation, such as fuzzy search, range retrieval etc..Institute Stating search index part main contents to include, the search condition utilizing inquiry to be provided goes to obtain query statement, then analyzes Processing these query statements, inquiry in index resources bank, returns to inquiry by final retrieval result afterwards.
Accompanying drawing 1 is the flow chart of described data base query method based on Lucene, and as shown in Figure 1, using should The detailed process of data base query method is as follows: gathers resource information the most regularly in data base, and enters for these resources They then to those asset creation index module, and are added index resources bank (index database) by the suitable analysis of row and process In;Then obtain the querying condition that user is provided, then pass through analysis querying condition module analysis and process these querying conditions, Afterwards by the inquiry in indexing resources bank (index database) of search index module, present Query Result, and by final retrieval knot Fruit returns to user.
The hierarchical structure schematic diagram that accompanying drawing 2 indexes for Lucene, as shown in Figure 2, the index of Lucene is divided into Under several levels, be first Index(index) layer, be then Segment(section) layer, next to that Document(document) layer, Bottom is Field(field) layer.Wherein, Index is made up of some Segment, and a section comprises the document of many, and each is civilian Shelves then include the field of many, and finally, the composition part of field is (Term) lemma one by one.
Owing to Lucene index is according to certain structure organization, can be at once in index money when therefore going to scan for Find in source, and the search of execution sequence in the resource before need not going, it is possible to a lot of for the area reduction of retrieval, greatly Improve recall precision.The Data Source of Lucene is not a kind of definite form, the level of a kind of file, inquiry Person goes the data source creating index can be able to be xml document, character string, txt document for various forms, or data base Interior data resource.
Below for the specific implementation process of data base query method based on Lucene described in the present embodiment, come with this Understand technology contents and the technique effect of this data base query method the most in detail.
First create the index of data, specifically include following steps:
1) first obtain information resources, i.e. go to obtain the resource of charting in data base with the set time, as creating index The source of data of file;
2) then carry out information resources filtration, in certain records, how to select field to be stored to make in maintenance information complete Avoid useless resource content on the premise of whole degree, give an example, for college student, typically have deposit value be The field such as " student number ", " being learned specialty ", " learned lesson ", " total marks of the examination ", and for the quiz carried out once in a while on classroom, The information such as mock examination substantially need not be deposited;
3) secondly, analyzing the information content filtered out, carry out word segmentation processing, the most the most frequently used participle instrument is with search dog word Mmseg4j segmenter based on storehouse;
4) followed by create index, it should be loaded in resources bank by recorded content, index, rope is created in the word above divided Draw and can leave in inside hard disk or internal memory;Finally index file is put into inside index resources bank.
After creating index, scanning for index, basis inquiry based on this, detailed process is as follows: first obtain inquiry The search condition that person provides, secondly the syntax grammatical structure of the conditional statement obtained by analyzing and processing, extracts corresponding key word, Constitute syntax tree according to certain rule, then remove to find the data-base recording meeting syntax by search index.
Citing syntax tree " key1andkey2notkey3 " carrys out illustratively searching step:
(1) seek each containing the record of key1, key2, key3 within first, removing data base's table of falling row chain;Then, by those Data-base recording chained list containing key1 and key2 combines, and can get the chained list simultaneously containing key1 and key2;
(2) then the chained list obtained and the record containing key3 are performed difference operation, remove those chained lists containing key3, Only containing only, to get, the record that key1, key2 do not contain key3, this record is final qualified chained list.
Data base query method based on Lucene described in the present embodiment, at the inverted structure of conventional query facility On the basis of, Lucene adds self piecemeal and creates the function of index file, it is possible to for the index that new asset creation is little, in order to Increase search efficiency, through merging with the index existed, it is possible to improve and optimizate each index resource.When to certain term When repeatedly inquiring about, the present invention is loaded into the result of first inquiry inside the caching of computer, so one query instantly is to same During one retrieval word, in directly removing Computer Cache, find corresponding information, and need not to index resources bank repeated retrieval, because of This search speed of the present invention can improve several times.When inquiry provides condition to contain several words, and the present invention can be syncopated as These words, then come comparison index database, and the database information of energy comparison change order via Term, can inquire relevant System recorded content and present to user, database enquiry expanding method has the highest recall rate.
Above-mentioned detailed description of the invention is only the concrete case of the present invention, and the scope of patent protection of the present invention includes but not limited to Above-mentioned detailed description of the invention, any that meet claims of the present invention and any person of an ordinary skill in the technical field The suitably change being done it or replacement, all should fall into the scope of patent protection of the present invention.

Claims (5)

1. the data base query method based on Lucene, it is characterised in that with reference to the level mould of Lucene index Formula, is first asset creation index in data base, and is added by index in index resources bank;Utilize search condition from indexing resource Inquire about in storehouse, it is thus achieved that retrieval result also returns;Its key step includes creating index part, search index part.
A kind of data base query method based on Lucene, it is characterised in that described wound Index part to refer to, in data base, gather resource regularly, and carry out suitable analysis and process for these resources, then Those asset creation are indexed and they is added in index resources bank.
A kind of data base query method based on Lucene, it is characterised in that described wound Index and partly specifically include following steps:
1) first obtain information resources, i.e. go to obtain the resource of charting in data base with the set time, as creating index The source of data of file;
2) then carry out information resources filtration, in certain records, select field to be stored to make maintaining information integrity On the premise of avoid useless resource content;
3) secondly, analyze the information content filtered out, carry out word segmentation processing;
4) followed by create index, recorded content is loaded in resources bank, index is created in the word above divided, index energy Enough leave in inside hard disk or internal memory;Finally index file is put into inside index resources bank.
A kind of data base query method based on Lucene, it is characterised in that described in search Seeking index part main contents to include, the search condition utilizing inquiry to be provided goes to obtain query statement, then analyzes and processes These query statements, inquiry in index resources bank, returns to inquiry by final retrieval result afterwards.
A kind of data base query method based on Lucene, it is characterised in that first obtain The search condition that inquiry provides, secondly the syntax grammatical structure of the conditional statement obtained by analyzing and processing, extraction is corresponding closes Keyword, constitutes syntax tree according to certain rule, then removes to find the data-base recording meeting syntax by search index.
CN201610571519.4A 2016-07-20 2016-07-20 Database query method based on Lucene Pending CN106227788A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201610571519.4A CN106227788A (en) 2016-07-20 2016-07-20 Database query method based on Lucene

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201610571519.4A CN106227788A (en) 2016-07-20 2016-07-20 Database query method based on Lucene

Publications (1)

Publication Number Publication Date
CN106227788A true CN106227788A (en) 2016-12-14

Family

ID=57531849

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201610571519.4A Pending CN106227788A (en) 2016-07-20 2016-07-20 Database query method based on Lucene

Country Status (1)

Country Link
CN (1) CN106227788A (en)

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106874429A (en) * 2017-01-23 2017-06-20 南威软件股份有限公司 A kind of method that stsndard SQL is converted into full-text search standard queries
CN107656985A (en) * 2017-09-11 2018-02-02 北京京东尚科信息技术有限公司 Web page interrogation method and its system
CN107729518A (en) * 2017-10-26 2018-02-23 山东浪潮云服务信息科技有限公司 The text searching method and device of a kind of relevant database
CN107818126A (en) * 2017-09-01 2018-03-20 广州慧睿思通信息科技有限公司 A kind of full text information retrieval method towards Mongo databases
CN109002444A (en) * 2017-06-07 2018-12-14 北大方正集团有限公司 Text searching method and full-text search device
CN109241098A (en) * 2018-08-08 2019-01-18 南京中新赛克科技有限责任公司 A kind of enquiring and optimizing method of distributed data base
CN111143349A (en) * 2019-11-26 2020-05-12 广东三扬网络科技有限公司 Method for quickly searching information from set, electronic equipment and storage medium
CN111488379A (en) * 2020-04-17 2020-08-04 焦点科技股份有限公司 Method for optimizing Hbase large data query
CN112115361A (en) * 2020-09-17 2020-12-22 浪潮卓数大数据产业发展有限公司 Data retrieval optimization method and system based on elastic search

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101030217A (en) * 2007-03-22 2007-09-05 华中科技大学 Method for indexing and acquiring semantic net information
CN101131704A (en) * 2006-08-23 2008-02-27 国际商业机器公司 Device and method for positional representation of content
CN101373532A (en) * 2008-07-10 2009-02-25 昆明理工大学 FAQ Chinese request-answering system implementing method in tourism field
US20090112830A1 (en) * 2007-10-25 2009-04-30 Fuji Xerox Co., Ltd. System and methods for searching images in presentations
CN105550236A (en) * 2015-11-27 2016-05-04 广州华多网络科技有限公司 Distributed data deduplication processing method and apparatus

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101131704A (en) * 2006-08-23 2008-02-27 国际商业机器公司 Device and method for positional representation of content
CN101030217A (en) * 2007-03-22 2007-09-05 华中科技大学 Method for indexing and acquiring semantic net information
US20090112830A1 (en) * 2007-10-25 2009-04-30 Fuji Xerox Co., Ltd. System and methods for searching images in presentations
CN101373532A (en) * 2008-07-10 2009-02-25 昆明理工大学 FAQ Chinese request-answering system implementing method in tourism field
CN105550236A (en) * 2015-11-27 2016-05-04 广州华多网络科技有限公司 Distributed data deduplication processing method and apparatus

Cited By (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106874429B (en) * 2017-01-23 2020-08-11 南威软件股份有限公司 Method for converting standard SQL into full text retrieval standard query
CN106874429A (en) * 2017-01-23 2017-06-20 南威软件股份有限公司 A kind of method that stsndard SQL is converted into full-text search standard queries
CN109002444A (en) * 2017-06-07 2018-12-14 北大方正集团有限公司 Text searching method and full-text search device
CN107818126A (en) * 2017-09-01 2018-03-20 广州慧睿思通信息科技有限公司 A kind of full text information retrieval method towards Mongo databases
CN107656985A (en) * 2017-09-11 2018-02-02 北京京东尚科信息技术有限公司 Web page interrogation method and its system
CN107729518A (en) * 2017-10-26 2018-02-23 山东浪潮云服务信息科技有限公司 The text searching method and device of a kind of relevant database
CN109241098A (en) * 2018-08-08 2019-01-18 南京中新赛克科技有限责任公司 A kind of enquiring and optimizing method of distributed data base
CN109241098B (en) * 2018-08-08 2022-02-18 南京中新赛克科技有限责任公司 Query optimization method for distributed database
CN111143349A (en) * 2019-11-26 2020-05-12 广东三扬网络科技有限公司 Method for quickly searching information from set, electronic equipment and storage medium
CN111488379A (en) * 2020-04-17 2020-08-04 焦点科技股份有限公司 Method for optimizing Hbase large data query
CN111488379B (en) * 2020-04-17 2022-07-19 焦点科技股份有限公司 Method for optimizing Hbase large data query
CN112115361A (en) * 2020-09-17 2020-12-22 浪潮卓数大数据产业发展有限公司 Data retrieval optimization method and system based on elastic search
CN112115361B (en) * 2020-09-17 2022-07-05 浪潮卓数大数据产业发展有限公司 Data retrieval optimization method and system based on elastic search

Similar Documents

Publication Publication Date Title
CN106227788A (en) Database query method based on Lucene
Brickley et al. Google Dataset Search: Building a search engine for datasets in an open Web ecosystem
CN109800284B (en) Task-oriented unstructured information intelligent question-answering system construction method
US8756245B2 (en) Systems and methods for answering user questions
CN104239513B (en) A kind of semantic retrieving method of domain-oriented data
US8990236B2 (en) Method, computer product program and system for analysis of data
CN103177075B (en) The detection of Knowledge based engineering entity and disambiguation
CN103064956B (en) For searching for the method for digital content, calculating system and computer-readable medium
CN103678576B (en) The text retrieval system analyzed based on dynamic semantics
CN108846029B (en) Information correlation analysis method based on knowledge graph
CN105447080B (en) A kind of inquiry complementing method in community's question and answer search
JP2006048686A (en) Generation method for document explanation based on phrase
JP2006048685A (en) Indexing method based on phrase in information retrieval system
JP2006048684A (en) Retrieval method based on phrase in information retrieval system
JP2006048683A (en) Phrase identification method in information retrieval system
CN106547893A (en) A kind of photo sort management system and photo sort management method
CN102819600B (en) Keyword search methodology towards relational database of power production management system
CN108763573A (en) A kind of OLAP engines method for routing and system based on machine learning
CN111061828B (en) Digital library knowledge retrieval method and device
Barrio et al. Sampling strategies for information extraction over the deep web
CN115422155A (en) Modeling method of data lake metadata model
Döhmen et al. Gitschemas: A dataset for automating relational data preparation tasks
CN109460467B (en) Method for constructing network information classification system
Huang et al. Design a batched information retrieval system based on a concept-lattice-like structure
Pokorný et al. Graph pattern index for Neo4j graph databases

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20161214

RJ01 Rejection of invention patent application after publication