CN104699724A - Lucene-based data searching method and device - Google Patents
Lucene-based data searching method and device Download PDFInfo
- Publication number
- CN104699724A CN104699724A CN201310671382.6A CN201310671382A CN104699724A CN 104699724 A CN104699724 A CN 104699724A CN 201310671382 A CN201310671382 A CN 201310671382A CN 104699724 A CN104699724 A CN 104699724A
- Authority
- CN
- China
- Prior art keywords
- index
- structural data
- inquiry request
- textual form
- form structural
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Abstract
The invention provides a Lucene-based data searching method and device. The method comprises the following steps: performing word segmentation on text-form structured data by virtue of Lucene, wherein each ASCII (American Standard Code for information interchange) character is taken as a word segmentation unit for the word segmentation of English words and numbers in the text-form structured data, and each Chinese character is taken as a word segmentation unit for the word segmentation of Chinese characters in the text-form structured data; creating indexes for the text-form structured data subjected to word segmentation; receiving a query request; acquiring an index corresponding to the query request; acquiring text-form structured data corresponding to the index corresponding to the query request according to a corresponding relationship between the indexes and the text-form structured data.
Description
Technical field
The application relates to data searching technology field, particularly relates to a kind of data search method based on Lucene and device.
Background technology
Structural data refers to the data that can be described with unified data model.Can be undertaken expressing and storing by the form of data record, it is fixing for being recorded the attributes such as the type of information, form, has strict length and type, is generally stored in relevant database.
Data search selects, arranges and evaluate the data of (qualification) by passing through stored in certain carrier, and need from certain data acquisition, search out the accurate data process or technology that can answer a question according to user.For structural data, Structured Query Language (SQL) (Structured Query Language, SQL) language is the inquiring technology of standard.
But carry out sql like language inquiry to structural data, need the correlation table of the database of given query and the needs inquiry of lane database, when not limiting query context or the too large inconvenience of query context is specified, sql like language cannot realize.
Summary of the invention
Technical problems to be solved in this application are to provide a kind of data search method based on Lucene and device, can realize the search to the structural data of textual form when not limiting query context or query context is too large.
In order to solve the problem, this application discloses a kind of data search method based on Lucene, comprise: use Lucene to carry out participle to textual form structural data, wherein, to the English word in textual form structural data and numeral with each ASCII(American Standard Code for information interchange) ascii character for word segmentation unit carries out participle, to the Chinese in textual form structural data with each Chinese text for word segmentation unit carries out participle; For the textual form structural data after participle creates index; Receive inquiry request; Obtain the index that inquiry request is corresponding; According to the corresponding relation between index and textual form structural data, obtain the textual form structural data that the index corresponding with inquiry request is corresponding.
Preferably, after receiving inquiry request, the method also comprises: use Lucene result to recommend plug-in unit to process request, obtains search suggestion, result recommendation and hot word.
Optionally, the index obtaining inquiry request corresponding comprises: recommend and hot word according to search suggestion, result, obtain the index that inquiry request is corresponding.
Preferably, after creating index for the textual form structural data after participle, said method also comprises: in units of the storehouse of structural data or table, be saved in by index on disk; The index obtaining inquiry request corresponding comprises: the index that search inquiry request is corresponding on disk; Obtain the index that inquiry request is corresponding.
Preferably, above-mentioned textual form structural data comprises: Bank Client System.
On the other hand, provide a kind of data serching device based on Lucene, comprise: word-dividing mode, for using Lucene, participle is carried out to textual form structural data, wherein, to the English word in textual form structural data and numeral with each ASCII(American Standard Code for information interchange) ascii character for word segmentation unit carries out participle, to the Chinese in textual form structural data with each Chinese text for word segmentation unit carries out participle; Creation module, for creating index for the textual form structural data after participle; Receiver module, for receiving inquiry request; First acquisition module, for obtaining index corresponding to inquiry request; Second acquisition module, for according to the corresponding relation between index and textual form structural data, obtains the textual form structural data that the index corresponding with inquiry request is corresponding.
Preferably, said apparatus also comprises: processing module, after receiving inquiry request at receiver module, uses Lucene result to recommend plug-in unit to process request, obtains search suggestion, result recommendation and hot word.
Preferably, the first acquisition module is used for recommending and hot word according to search suggestion, result, obtains the index that inquiry request is corresponding.
Optionally, said apparatus also comprises: preserve module, for after creating index for the textual form structural data after participle, in units of the storehouse of structural data or table, is saved in by index on disk; First acquisition module for index corresponding to search inquiry request on disk, and obtains index corresponding to inquiry request.
Preferably, textual form structural data comprises: Bank Client System.
Compared with prior art, the application has the following advantages:
Use Lucene to carry out participle to the structural data of textual form, then index is created to structural data, like this, when not limiting query context or query context is too large, also can by the search of index realization to the structural data of textual form.
Accompanying drawing explanation
Fig. 1 is the process flow diagram of a kind of data search method based on Lucene according to the embodiment of the present invention;
Fig. 2 is the schematic diagram of a kind of data search method based on Lucene according to the embodiment of the present invention;
Fig. 3 is the structured flowchart of a kind of data serching device based on Lucene according to the embodiment of the present invention;
Fig. 4 is according to the another kind of the embodiment of the present invention structured flowchart based on the data serching device of Lucene;
Fig. 5 is another structured flowchart based on the data serching device of Lucene according to the embodiment of the present invention.
Embodiment
For enabling above-mentioned purpose, the feature and advantage of the application more become apparent, below in conjunction with the drawings and specific embodiments, the application is described in further detail.
With reference to Fig. 1, show a kind of data search method based on Lucene of the application, comprising:
Step 102, Lucene is used to carry out participle to textual form structural data, wherein, to the English word in textual form structural data and numeral with each ASCII(American Standard Code for information interchange) (American Standard Code for Information Interchange, ASC II) character is that word segmentation unit carries out participle, to the Chinese in textual form structural data with each Chinese text for word segmentation unit carries out participle; Alternatively, text form structure data comprise: Bank Client System.
Wherein, Lucene is a full-text search engine kit of increasing income, and provides query engine and index engine.
Step 104, for the textual form structural data after participle creates index;
In a preferred embodiment of the embodiment of the present invention, in units of the storehouse of structural data or table, index can be saved on disk;
Step 106, receives inquiry request;
Preferably, can by the inquiry request of user interface reception from user.
Alternatively, after step 106, use Lucene result to recommend plug-in unit to process request, obtain search suggestion, result recommendation and hot word.In the present embodiment, this Lucene result existing can be used to recommend plug-in unit.
Step 108, obtains the index that inquiry request is corresponding;
If use Lucene result to recommend plug-in unit to process request after step 106, then step 108 can comprise: recommend and hot word according to search suggestion, result, obtain the index that inquiry request is corresponding.Such as, according to wherein result recommendation, directly obtain result, that is, obtain index corresponding to inquiry request.Due in prior art, sql like language inquiry is carried out to structural data, the functions such as result recommendation, popular word cannot be realized.This embodiment introduces Lucene result and recommends plug-in unit to provide search suggestion, result recommendation and hot word, improves effectiveness of retrieval.
If at step 104, be saved in by index on disk, then step 108 can comprise: the index that search inquiry request is corresponding on disk; Obtain the index that inquiry request is corresponding.
Step 110, according to the corresponding relation between index and textual form structural data, obtains the textual form structural data that the index corresponding with inquiry request is corresponding.
Such as, according to the mapping table between index and textual form structural data, obtain the textual form structural data that the index corresponding with inquiry request is corresponding.Be understandable that, the corresponding relation between index and textual form structural data can adopt other the form of expression, does not limit here.
By above-described embodiment, use Lucene to set up the index of structural data, when carrying out data search, not needing given query scope or realizing given query scope easily.
The embodiment of the present invention additionally provides a kind of data search method, and as shown in Figure 2, the method comprises:
Read the data content in textual form structural data, the scope of reading comprises: relevant database, structured data file and other structural datas.
Carry out participle to the data content read out, word segmentation regulation comprises: English word, numeral and other ascii character with each ascii character for word segmentation unit carries out participle, Chinese with each Chinese text for word segmentation unit carries out participle.
Lucene is used to create index for data content, and generating indexes file, such as, with the storehouse of structural data or table for index stores unit, preserve the structural data index file of corresponding storehouse or table on corresponding disk storage position, during search with the corresponding disk storage position of structural data index file for hunting zone, carry out the object of searching for reach in certain scope.If the storage space not having the index file of setting data storehouse or table to deposit is put, then can search in the scope of total data acquisition.
For the structural data creating index, Lucene can be used to search for.Such as, use Lucene to receive structured data searching request, preferably, Lucene result can be used to recommend, and plug-in unit provides the search suggestion of structural data, result is recommended and hot word function, and result is recommended user.
When needing to inquire about Zhang San this user in all relevant information such as the opening an account of so-and-so bank, deposits, said method can be used to create the index of the structural data of this bank, all information about Zhang San is gone out according to search index, this does not need given query scope, this improves effectiveness of retrieval.
With reference to Fig. 3, show a kind of data serching device based on Lucene of the application, this device is used for realizing above embodiment of the method, and therefore, the feature in said method embodiment all can be used in this device, and this device comprises:
Word-dividing mode 302, for using Lucene, participle is carried out to textual form structural data, wherein, to the English word in textual form structural data and numeral with each ASCII(American Standard Code for information interchange) ascii character for word segmentation unit carries out participle, to the Chinese in textual form structural data with each Chinese text for word segmentation unit carries out participle;
Creation module 304, for creating index for the textual form structural data after participle;
Receiver module 306, for receiving inquiry request;
First acquisition module 308, for obtaining index corresponding to inquiry request;
Second acquisition module 310, for according to the corresponding relation between index and textual form structural data, obtains the textual form structural data that the index corresponding with inquiry request is corresponding.
Preferably, as shown in Figure 4, said apparatus also comprises:
Processing module 402, after receiving inquiry request at receiver module 306, uses Lucene result to recommend plug-in unit to process request, obtains search suggestion, result recommendation and hot word.
Alternatively, the first acquisition module 308, for recommending and hot word according to search suggestion, result, obtains the index that inquiry request is corresponding.
Alternatively, as shown in Figure 5, said apparatus also comprises: preserve module 502, for after creating index for the textual form structural data after participle, in units of the storehouse of structural data or table, is saved in by index on disk; First acquisition module 308 for index corresponding to search inquiry request on disk, and obtains index corresponding to inquiry request.
Preferably, the second acquisition module 310, for according to the mapping table between index and textual form structural data, obtains the textual form structural data that the index corresponding with inquiry request is corresponding.
Preferably, above-mentioned textual form structural data comprises: Bank Client System
By above-described embodiment, use Lucene to set up the index of structural data, when carrying out data search, not needing given query scope or realizing given query scope easily.
Each embodiment in this instructions all adopts the mode of going forward one by one to describe, and what each embodiment stressed is the difference with other embodiments, between each embodiment identical similar part mutually see.For system embodiment, due to itself and embodiment of the method basic simlarity, so description is fairly simple, relevant part illustrates see the part of embodiment of the method.
A kind of data serching device based on Lucene above the application provided and device, be described in detail, apply specific case herein to set forth the principle of the application and embodiment, the explanation of above embodiment is just for helping method and the core concept thereof of understanding the application; Meanwhile, for one of ordinary skill in the art, according to the thought of the application, all will change in specific embodiments and applications, in sum, this description should not be construed as the restriction to the application.
Claims (10)
1. based on a data search method of Lucene, it is characterized in that, comprising:
Lucene is used to carry out participle to textual form structural data, wherein, to the English word in described textual form structural data and numeral with each ASCII(American Standard Code for information interchange) ascii character for word segmentation unit carries out participle, to the Chinese in described textual form structural data with each Chinese text for word segmentation unit carries out participle;
For the textual form structural data after participle creates index;
Receive inquiry request;
Obtain the index that described inquiry request is corresponding;
According to the corresponding relation between index and textual form structural data, obtain the textual form structural data that the index corresponding with described inquiry request is corresponding.
2. the method for claim 1, is characterized in that, after receiving inquiry request, described method also comprises:
Use Lucene result to recommend plug-in unit to process described request, obtain search suggestion, result recommendation and hot word.
3. the method for claim 1, is characterized in that, the index obtaining described inquiry request corresponding comprises:
Recommend and hot word according to described search suggestion, result, obtain the index that described inquiry request is corresponding.
4. the method for claim 1, is characterized in that,
After creating index for the textual form structural data after participle, described method also comprises: in units of the storehouse of structural data or table, be saved on disk by described index;
The index obtaining described inquiry request corresponding comprises: on described disk, search for index corresponding to described inquiry request; Obtain the index that described inquiry request is corresponding.
5. the method according to any one of Claims 1-4, is characterized in that, described textual form structural data comprises: Bank Client System.
6. based on a data serching device of Lucene, it is characterized in that, comprising:
Word-dividing mode, for using Lucene, participle is carried out to textual form structural data, wherein, to the English word in described textual form structural data and numeral with each ASCII(American Standard Code for information interchange) ascii character for word segmentation unit carries out participle, to the Chinese in described textual form structural data with each Chinese text for word segmentation unit carries out participle;
Creation module, for creating index for the textual form structural data after participle;
Receiver module, for receiving inquiry request;
First acquisition module, for obtaining index corresponding to described inquiry request;
Second acquisition module, for according to the corresponding relation between index and textual form structural data, obtains the textual form structural data that the index corresponding with described inquiry request is corresponding.
7. device as claimed in claim 6, it is characterized in that, described device also comprises:
Processing module, after receiving inquiry request at receiver module, uses Lucene result to recommend plug-in unit to process described request, obtains search suggestion, result recommendation and hot word.
8. device as claimed in claim 7, is characterized in that, described first acquisition module is used for recommending and hot word according to described search suggestion, result, obtains the index that described inquiry request is corresponding.
9. device as claimed in claim 6, is characterized in that,
Described device also comprises: preserve module, for after creating index for the textual form structural data after participle, in units of the storehouse of structural data or table, is saved on disk by described index;
First acquisition module for searching for index corresponding to described inquiry request on described disk, and obtains index corresponding to described inquiry request.
10. the device according to any one of claim 6 to 9, is characterized in that, described textual form structural data comprises: Bank Client System.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201310671382.6A CN104699724A (en) | 2013-12-10 | 2013-12-10 | Lucene-based data searching method and device |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201310671382.6A CN104699724A (en) | 2013-12-10 | 2013-12-10 | Lucene-based data searching method and device |
Publications (1)
Publication Number | Publication Date |
---|---|
CN104699724A true CN104699724A (en) | 2015-06-10 |
Family
ID=53346856
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201310671382.6A Pending CN104699724A (en) | 2013-12-10 | 2013-12-10 | Lucene-based data searching method and device |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN104699724A (en) |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105426539A (en) * | 2015-12-23 | 2016-03-23 | 成都电科心通捷信科技有限公司 | Dictionary-based lucene Chinese word segmentation method |
CN107451122A (en) * | 2017-08-09 | 2017-12-08 | 南京华飞数据技术有限公司 | A kind of dynamic n member segmenting methods based on Lucene |
CN108388635A (en) * | 2018-02-24 | 2018-08-10 | 杭州朗和科技有限公司 | Data search method, device, medium and computing device |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101154241A (en) * | 2007-10-11 | 2008-04-02 | 北京金山软件有限公司 | Data searching method and data searching system |
CN101770499A (en) * | 2009-01-07 | 2010-07-07 | 上海聚力传媒技术有限公司 | Information retrieval method in search engine and corresponding search engine |
CN102915365A (en) * | 2012-10-24 | 2013-02-06 | 苏州两江科技有限公司 | Hadoop-based construction method for distributed search engine |
CN102929902A (en) * | 2012-07-05 | 2013-02-13 | 江苏新瑞峰信息科技有限公司 | Character splitting method and device based on Chinese retrieval |
CN103064839A (en) * | 2011-10-19 | 2013-04-24 | 北京中文在线数字出版股份有限公司 | Portable document format (Pdf) full-text on-line retrieval method |
-
2013
- 2013-12-10 CN CN201310671382.6A patent/CN104699724A/en active Pending
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101154241A (en) * | 2007-10-11 | 2008-04-02 | 北京金山软件有限公司 | Data searching method and data searching system |
CN101770499A (en) * | 2009-01-07 | 2010-07-07 | 上海聚力传媒技术有限公司 | Information retrieval method in search engine and corresponding search engine |
CN103064839A (en) * | 2011-10-19 | 2013-04-24 | 北京中文在线数字出版股份有限公司 | Portable document format (Pdf) full-text on-line retrieval method |
CN102929902A (en) * | 2012-07-05 | 2013-02-13 | 江苏新瑞峰信息科技有限公司 | Character splitting method and device based on Chinese retrieval |
CN102915365A (en) * | 2012-10-24 | 2013-02-06 | 苏州两江科技有限公司 | Hadoop-based construction method for distributed search engine |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105426539A (en) * | 2015-12-23 | 2016-03-23 | 成都电科心通捷信科技有限公司 | Dictionary-based lucene Chinese word segmentation method |
CN105426539B (en) * | 2015-12-23 | 2018-12-18 | 成都云数未来信息科学有限公司 | A kind of lucene Chinese word cutting method based on dictionary |
CN107451122A (en) * | 2017-08-09 | 2017-12-08 | 南京华飞数据技术有限公司 | A kind of dynamic n member segmenting methods based on Lucene |
CN107451122B (en) * | 2017-08-09 | 2020-11-13 | 南京华飞数据技术有限公司 | Dynamic n-element word segmentation method based on Lucene |
CN108388635A (en) * | 2018-02-24 | 2018-08-10 | 杭州朗和科技有限公司 | Data search method, device, medium and computing device |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN107480158B (en) | Method and system for evaluating matching of content item and image based on similarity score | |
US11176142B2 (en) | Method of data query based on evaluation and device | |
US9152674B2 (en) | Performing application searches | |
RU2701110C2 (en) | Studying and using contextual rules of extracting content to eliminate ambiguity of requests | |
CN102479191B (en) | Method and device for providing multi-granularity word segmentation result | |
US20170169010A1 (en) | Interactive addition of semantic concepts to a document | |
US10346494B2 (en) | Search engine system communicating with a full text search engine to retrieve most similar documents | |
CN102567329B (en) | Data query method and data query system | |
US20100228744A1 (en) | Intelligent enhancement of a search result snippet | |
CN102999625A (en) | Method for realizing semantic extension on retrieval request | |
US20100191758A1 (en) | System and method for improved search relevance using proximity boosting | |
CN109582799A (en) | The determination method, apparatus and electronic equipment of knowledge sample data set | |
CN109145110B (en) | Label query method and device | |
US10685073B1 (en) | Selecting textual representations for entity attribute values | |
US10289642B2 (en) | Method and system for matching images with content using whitelists and blacklists in response to a search query | |
CN106095771A (en) | Writing householder method and device | |
CN104035972A (en) | Knowledge recommending method and system based on micro blogs | |
US8700624B1 (en) | Collaborative search apps platform for web search | |
CN103500158A (en) | Method and device for annotating electronic document | |
WO2019173085A1 (en) | Intelligent knowledge-learning and question-answering | |
CN104699724A (en) | Lucene-based data searching method and device | |
CN103927342A (en) | Vertical search engine system on basis of big data | |
Derungs et al. | Mining nearness relations from an n-grams Web corpus in geographical space | |
CN107977395B (en) | Method for helping user read and understand electronic article and intelligent voice assistant | |
CN104462282A (en) | Information searching method and device |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20150610 |