CN104699724A - Lucene-based data searching method and device - Google Patents

Lucene-based data searching method and device Download PDF

Info

Publication number
CN104699724A
CN104699724A CN201310671382.6A CN201310671382A CN104699724A CN 104699724 A CN104699724 A CN 104699724A CN 201310671382 A CN201310671382 A CN 201310671382A CN 104699724 A CN104699724 A CN 104699724A
Authority
CN
China
Prior art keywords
index
structural data
inquiry request
textual form
form structural
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201310671382.6A
Other languages
Chinese (zh)
Inventor
李励同
滕一勤
朱大勇
唐江华
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
BEIJING ADVANCED DIGITAL TECHNOLOGY Co Ltd
Original Assignee
BEIJING ADVANCED DIGITAL TECHNOLOGY Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by BEIJING ADVANCED DIGITAL TECHNOLOGY Co Ltd filed Critical BEIJING ADVANCED DIGITAL TECHNOLOGY Co Ltd
Priority to CN201310671382.6A priority Critical patent/CN104699724A/en
Publication of CN104699724A publication Critical patent/CN104699724A/en
Pending legal-status Critical Current

Links

Abstract

The invention provides a Lucene-based data searching method and device. The method comprises the following steps: performing word segmentation on text-form structured data by virtue of Lucene, wherein each ASCII (American Standard Code for information interchange) character is taken as a word segmentation unit for the word segmentation of English words and numbers in the text-form structured data, and each Chinese character is taken as a word segmentation unit for the word segmentation of Chinese characters in the text-form structured data; creating indexes for the text-form structured data subjected to word segmentation; receiving a query request; acquiring an index corresponding to the query request; acquiring text-form structured data corresponding to the index corresponding to the query request according to a corresponding relationship between the indexes and the text-form structured data.

Description

A kind of data search method based on Lucene and device
Technical field
The application relates to data searching technology field, particularly relates to a kind of data search method based on Lucene and device.
Background technology
Structural data refers to the data that can be described with unified data model.Can be undertaken expressing and storing by the form of data record, it is fixing for being recorded the attributes such as the type of information, form, has strict length and type, is generally stored in relevant database.
Data search selects, arranges and evaluate the data of (qualification) by passing through stored in certain carrier, and need from certain data acquisition, search out the accurate data process or technology that can answer a question according to user.For structural data, Structured Query Language (SQL) (Structured Query Language, SQL) language is the inquiring technology of standard.
But carry out sql like language inquiry to structural data, need the correlation table of the database of given query and the needs inquiry of lane database, when not limiting query context or the too large inconvenience of query context is specified, sql like language cannot realize.
Summary of the invention
Technical problems to be solved in this application are to provide a kind of data search method based on Lucene and device, can realize the search to the structural data of textual form when not limiting query context or query context is too large.
In order to solve the problem, this application discloses a kind of data search method based on Lucene, comprise: use Lucene to carry out participle to textual form structural data, wherein, to the English word in textual form structural data and numeral with each ASCII(American Standard Code for information interchange) ascii character for word segmentation unit carries out participle, to the Chinese in textual form structural data with each Chinese text for word segmentation unit carries out participle; For the textual form structural data after participle creates index; Receive inquiry request; Obtain the index that inquiry request is corresponding; According to the corresponding relation between index and textual form structural data, obtain the textual form structural data that the index corresponding with inquiry request is corresponding.
Preferably, after receiving inquiry request, the method also comprises: use Lucene result to recommend plug-in unit to process request, obtains search suggestion, result recommendation and hot word.
Optionally, the index obtaining inquiry request corresponding comprises: recommend and hot word according to search suggestion, result, obtain the index that inquiry request is corresponding.
Preferably, after creating index for the textual form structural data after participle, said method also comprises: in units of the storehouse of structural data or table, be saved in by index on disk; The index obtaining inquiry request corresponding comprises: the index that search inquiry request is corresponding on disk; Obtain the index that inquiry request is corresponding.
Preferably, above-mentioned textual form structural data comprises: Bank Client System.
On the other hand, provide a kind of data serching device based on Lucene, comprise: word-dividing mode, for using Lucene, participle is carried out to textual form structural data, wherein, to the English word in textual form structural data and numeral with each ASCII(American Standard Code for information interchange) ascii character for word segmentation unit carries out participle, to the Chinese in textual form structural data with each Chinese text for word segmentation unit carries out participle; Creation module, for creating index for the textual form structural data after participle; Receiver module, for receiving inquiry request; First acquisition module, for obtaining index corresponding to inquiry request; Second acquisition module, for according to the corresponding relation between index and textual form structural data, obtains the textual form structural data that the index corresponding with inquiry request is corresponding.
Preferably, said apparatus also comprises: processing module, after receiving inquiry request at receiver module, uses Lucene result to recommend plug-in unit to process request, obtains search suggestion, result recommendation and hot word.
Preferably, the first acquisition module is used for recommending and hot word according to search suggestion, result, obtains the index that inquiry request is corresponding.
Optionally, said apparatus also comprises: preserve module, for after creating index for the textual form structural data after participle, in units of the storehouse of structural data or table, is saved in by index on disk; First acquisition module for index corresponding to search inquiry request on disk, and obtains index corresponding to inquiry request.
Preferably, textual form structural data comprises: Bank Client System.
Compared with prior art, the application has the following advantages:
Use Lucene to carry out participle to the structural data of textual form, then index is created to structural data, like this, when not limiting query context or query context is too large, also can by the search of index realization to the structural data of textual form.
Accompanying drawing explanation
Fig. 1 is the process flow diagram of a kind of data search method based on Lucene according to the embodiment of the present invention;
Fig. 2 is the schematic diagram of a kind of data search method based on Lucene according to the embodiment of the present invention;
Fig. 3 is the structured flowchart of a kind of data serching device based on Lucene according to the embodiment of the present invention;
Fig. 4 is according to the another kind of the embodiment of the present invention structured flowchart based on the data serching device of Lucene;
Fig. 5 is another structured flowchart based on the data serching device of Lucene according to the embodiment of the present invention.
Embodiment
For enabling above-mentioned purpose, the feature and advantage of the application more become apparent, below in conjunction with the drawings and specific embodiments, the application is described in further detail.
With reference to Fig. 1, show a kind of data search method based on Lucene of the application, comprising:
Step 102, Lucene is used to carry out participle to textual form structural data, wherein, to the English word in textual form structural data and numeral with each ASCII(American Standard Code for information interchange) (American Standard Code for Information Interchange, ASC II) character is that word segmentation unit carries out participle, to the Chinese in textual form structural data with each Chinese text for word segmentation unit carries out participle; Alternatively, text form structure data comprise: Bank Client System.
Wherein, Lucene is a full-text search engine kit of increasing income, and provides query engine and index engine.
Step 104, for the textual form structural data after participle creates index;
In a preferred embodiment of the embodiment of the present invention, in units of the storehouse of structural data or table, index can be saved on disk;
Step 106, receives inquiry request;
Preferably, can by the inquiry request of user interface reception from user.
Alternatively, after step 106, use Lucene result to recommend plug-in unit to process request, obtain search suggestion, result recommendation and hot word.In the present embodiment, this Lucene result existing can be used to recommend plug-in unit.
Step 108, obtains the index that inquiry request is corresponding;
If use Lucene result to recommend plug-in unit to process request after step 106, then step 108 can comprise: recommend and hot word according to search suggestion, result, obtain the index that inquiry request is corresponding.Such as, according to wherein result recommendation, directly obtain result, that is, obtain index corresponding to inquiry request.Due in prior art, sql like language inquiry is carried out to structural data, the functions such as result recommendation, popular word cannot be realized.This embodiment introduces Lucene result and recommends plug-in unit to provide search suggestion, result recommendation and hot word, improves effectiveness of retrieval.
If at step 104, be saved in by index on disk, then step 108 can comprise: the index that search inquiry request is corresponding on disk; Obtain the index that inquiry request is corresponding.
Step 110, according to the corresponding relation between index and textual form structural data, obtains the textual form structural data that the index corresponding with inquiry request is corresponding.
Such as, according to the mapping table between index and textual form structural data, obtain the textual form structural data that the index corresponding with inquiry request is corresponding.Be understandable that, the corresponding relation between index and textual form structural data can adopt other the form of expression, does not limit here.
By above-described embodiment, use Lucene to set up the index of structural data, when carrying out data search, not needing given query scope or realizing given query scope easily.
The embodiment of the present invention additionally provides a kind of data search method, and as shown in Figure 2, the method comprises:
Read the data content in textual form structural data, the scope of reading comprises: relevant database, structured data file and other structural datas.
Carry out participle to the data content read out, word segmentation regulation comprises: English word, numeral and other ascii character with each ascii character for word segmentation unit carries out participle, Chinese with each Chinese text for word segmentation unit carries out participle.
Lucene is used to create index for data content, and generating indexes file, such as, with the storehouse of structural data or table for index stores unit, preserve the structural data index file of corresponding storehouse or table on corresponding disk storage position, during search with the corresponding disk storage position of structural data index file for hunting zone, carry out the object of searching for reach in certain scope.If the storage space not having the index file of setting data storehouse or table to deposit is put, then can search in the scope of total data acquisition.
For the structural data creating index, Lucene can be used to search for.Such as, use Lucene to receive structured data searching request, preferably, Lucene result can be used to recommend, and plug-in unit provides the search suggestion of structural data, result is recommended and hot word function, and result is recommended user.
When needing to inquire about Zhang San this user in all relevant information such as the opening an account of so-and-so bank, deposits, said method can be used to create the index of the structural data of this bank, all information about Zhang San is gone out according to search index, this does not need given query scope, this improves effectiveness of retrieval.
With reference to Fig. 3, show a kind of data serching device based on Lucene of the application, this device is used for realizing above embodiment of the method, and therefore, the feature in said method embodiment all can be used in this device, and this device comprises:
Word-dividing mode 302, for using Lucene, participle is carried out to textual form structural data, wherein, to the English word in textual form structural data and numeral with each ASCII(American Standard Code for information interchange) ascii character for word segmentation unit carries out participle, to the Chinese in textual form structural data with each Chinese text for word segmentation unit carries out participle;
Creation module 304, for creating index for the textual form structural data after participle;
Receiver module 306, for receiving inquiry request;
First acquisition module 308, for obtaining index corresponding to inquiry request;
Second acquisition module 310, for according to the corresponding relation between index and textual form structural data, obtains the textual form structural data that the index corresponding with inquiry request is corresponding.
Preferably, as shown in Figure 4, said apparatus also comprises:
Processing module 402, after receiving inquiry request at receiver module 306, uses Lucene result to recommend plug-in unit to process request, obtains search suggestion, result recommendation and hot word.
Alternatively, the first acquisition module 308, for recommending and hot word according to search suggestion, result, obtains the index that inquiry request is corresponding.
Alternatively, as shown in Figure 5, said apparatus also comprises: preserve module 502, for after creating index for the textual form structural data after participle, in units of the storehouse of structural data or table, is saved in by index on disk; First acquisition module 308 for index corresponding to search inquiry request on disk, and obtains index corresponding to inquiry request.
Preferably, the second acquisition module 310, for according to the mapping table between index and textual form structural data, obtains the textual form structural data that the index corresponding with inquiry request is corresponding.
Preferably, above-mentioned textual form structural data comprises: Bank Client System
By above-described embodiment, use Lucene to set up the index of structural data, when carrying out data search, not needing given query scope or realizing given query scope easily.
Each embodiment in this instructions all adopts the mode of going forward one by one to describe, and what each embodiment stressed is the difference with other embodiments, between each embodiment identical similar part mutually see.For system embodiment, due to itself and embodiment of the method basic simlarity, so description is fairly simple, relevant part illustrates see the part of embodiment of the method.
A kind of data serching device based on Lucene above the application provided and device, be described in detail, apply specific case herein to set forth the principle of the application and embodiment, the explanation of above embodiment is just for helping method and the core concept thereof of understanding the application; Meanwhile, for one of ordinary skill in the art, according to the thought of the application, all will change in specific embodiments and applications, in sum, this description should not be construed as the restriction to the application.

Claims (10)

1. based on a data search method of Lucene, it is characterized in that, comprising:
Lucene is used to carry out participle to textual form structural data, wherein, to the English word in described textual form structural data and numeral with each ASCII(American Standard Code for information interchange) ascii character for word segmentation unit carries out participle, to the Chinese in described textual form structural data with each Chinese text for word segmentation unit carries out participle;
For the textual form structural data after participle creates index;
Receive inquiry request;
Obtain the index that described inquiry request is corresponding;
According to the corresponding relation between index and textual form structural data, obtain the textual form structural data that the index corresponding with described inquiry request is corresponding.
2. the method for claim 1, is characterized in that, after receiving inquiry request, described method also comprises:
Use Lucene result to recommend plug-in unit to process described request, obtain search suggestion, result recommendation and hot word.
3. the method for claim 1, is characterized in that, the index obtaining described inquiry request corresponding comprises:
Recommend and hot word according to described search suggestion, result, obtain the index that described inquiry request is corresponding.
4. the method for claim 1, is characterized in that,
After creating index for the textual form structural data after participle, described method also comprises: in units of the storehouse of structural data or table, be saved on disk by described index;
The index obtaining described inquiry request corresponding comprises: on described disk, search for index corresponding to described inquiry request; Obtain the index that described inquiry request is corresponding.
5. the method according to any one of Claims 1-4, is characterized in that, described textual form structural data comprises: Bank Client System.
6. based on a data serching device of Lucene, it is characterized in that, comprising:
Word-dividing mode, for using Lucene, participle is carried out to textual form structural data, wherein, to the English word in described textual form structural data and numeral with each ASCII(American Standard Code for information interchange) ascii character for word segmentation unit carries out participle, to the Chinese in described textual form structural data with each Chinese text for word segmentation unit carries out participle;
Creation module, for creating index for the textual form structural data after participle;
Receiver module, for receiving inquiry request;
First acquisition module, for obtaining index corresponding to described inquiry request;
Second acquisition module, for according to the corresponding relation between index and textual form structural data, obtains the textual form structural data that the index corresponding with described inquiry request is corresponding.
7. device as claimed in claim 6, it is characterized in that, described device also comprises:
Processing module, after receiving inquiry request at receiver module, uses Lucene result to recommend plug-in unit to process described request, obtains search suggestion, result recommendation and hot word.
8. device as claimed in claim 7, is characterized in that, described first acquisition module is used for recommending and hot word according to described search suggestion, result, obtains the index that described inquiry request is corresponding.
9. device as claimed in claim 6, is characterized in that,
Described device also comprises: preserve module, for after creating index for the textual form structural data after participle, in units of the storehouse of structural data or table, is saved on disk by described index;
First acquisition module for searching for index corresponding to described inquiry request on described disk, and obtains index corresponding to described inquiry request.
10. the device according to any one of claim 6 to 9, is characterized in that, described textual form structural data comprises: Bank Client System.
CN201310671382.6A 2013-12-10 2013-12-10 Lucene-based data searching method and device Pending CN104699724A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201310671382.6A CN104699724A (en) 2013-12-10 2013-12-10 Lucene-based data searching method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201310671382.6A CN104699724A (en) 2013-12-10 2013-12-10 Lucene-based data searching method and device

Publications (1)

Publication Number Publication Date
CN104699724A true CN104699724A (en) 2015-06-10

Family

ID=53346856

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201310671382.6A Pending CN104699724A (en) 2013-12-10 2013-12-10 Lucene-based data searching method and device

Country Status (1)

Country Link
CN (1) CN104699724A (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105426539A (en) * 2015-12-23 2016-03-23 成都电科心通捷信科技有限公司 Dictionary-based lucene Chinese word segmentation method
CN107451122A (en) * 2017-08-09 2017-12-08 南京华飞数据技术有限公司 A kind of dynamic n member segmenting methods based on Lucene
CN108388635A (en) * 2018-02-24 2018-08-10 杭州朗和科技有限公司 Data search method, device, medium and computing device

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101154241A (en) * 2007-10-11 2008-04-02 北京金山软件有限公司 Data searching method and data searching system
CN101770499A (en) * 2009-01-07 2010-07-07 上海聚力传媒技术有限公司 Information retrieval method in search engine and corresponding search engine
CN102915365A (en) * 2012-10-24 2013-02-06 苏州两江科技有限公司 Hadoop-based construction method for distributed search engine
CN102929902A (en) * 2012-07-05 2013-02-13 江苏新瑞峰信息科技有限公司 Character splitting method and device based on Chinese retrieval
CN103064839A (en) * 2011-10-19 2013-04-24 北京中文在线数字出版股份有限公司 Portable document format (Pdf) full-text on-line retrieval method

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101154241A (en) * 2007-10-11 2008-04-02 北京金山软件有限公司 Data searching method and data searching system
CN101770499A (en) * 2009-01-07 2010-07-07 上海聚力传媒技术有限公司 Information retrieval method in search engine and corresponding search engine
CN103064839A (en) * 2011-10-19 2013-04-24 北京中文在线数字出版股份有限公司 Portable document format (Pdf) full-text on-line retrieval method
CN102929902A (en) * 2012-07-05 2013-02-13 江苏新瑞峰信息科技有限公司 Character splitting method and device based on Chinese retrieval
CN102915365A (en) * 2012-10-24 2013-02-06 苏州两江科技有限公司 Hadoop-based construction method for distributed search engine

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105426539A (en) * 2015-12-23 2016-03-23 成都电科心通捷信科技有限公司 Dictionary-based lucene Chinese word segmentation method
CN105426539B (en) * 2015-12-23 2018-12-18 成都云数未来信息科学有限公司 A kind of lucene Chinese word cutting method based on dictionary
CN107451122A (en) * 2017-08-09 2017-12-08 南京华飞数据技术有限公司 A kind of dynamic n member segmenting methods based on Lucene
CN107451122B (en) * 2017-08-09 2020-11-13 南京华飞数据技术有限公司 Dynamic n-element word segmentation method based on Lucene
CN108388635A (en) * 2018-02-24 2018-08-10 杭州朗和科技有限公司 Data search method, device, medium and computing device

Similar Documents

Publication Publication Date Title
CN107480158B (en) Method and system for evaluating matching of content item and image based on similarity score
US11176142B2 (en) Method of data query based on evaluation and device
US9152674B2 (en) Performing application searches
RU2701110C2 (en) Studying and using contextual rules of extracting content to eliminate ambiguity of requests
CN102479191B (en) Method and device for providing multi-granularity word segmentation result
US20170169010A1 (en) Interactive addition of semantic concepts to a document
US10346494B2 (en) Search engine system communicating with a full text search engine to retrieve most similar documents
CN102567329B (en) Data query method and data query system
US20100228744A1 (en) Intelligent enhancement of a search result snippet
CN102999625A (en) Method for realizing semantic extension on retrieval request
US20100191758A1 (en) System and method for improved search relevance using proximity boosting
CN109582799A (en) The determination method, apparatus and electronic equipment of knowledge sample data set
CN109145110B (en) Label query method and device
US10685073B1 (en) Selecting textual representations for entity attribute values
US10289642B2 (en) Method and system for matching images with content using whitelists and blacklists in response to a search query
CN106095771A (en) Writing householder method and device
CN104035972A (en) Knowledge recommending method and system based on micro blogs
US8700624B1 (en) Collaborative search apps platform for web search
CN103500158A (en) Method and device for annotating electronic document
WO2019173085A1 (en) Intelligent knowledge-learning and question-answering
CN104699724A (en) Lucene-based data searching method and device
CN103927342A (en) Vertical search engine system on basis of big data
Derungs et al. Mining nearness relations from an n-grams Web corpus in geographical space
CN107977395B (en) Method for helping user read and understand electronic article and intelligent voice assistant
CN104462282A (en) Information searching method and device

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20150610