CN104391941A - Method for rapidly establishing full-text retrieval tool for common files - Google Patents
Method for rapidly establishing full-text retrieval tool for common files Download PDFInfo
- Publication number
- CN104391941A CN104391941A CN201410684418.9A CN201410684418A CN104391941A CN 104391941 A CN104391941 A CN 104391941A CN 201410684418 A CN201410684418 A CN 201410684418A CN 104391941 A CN104391941 A CN 104391941A
- Authority
- CN
- China
- Prior art keywords
- full
- module
- text
- retrieval
- index
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/80—Information retrieval; Database structures therefor; File system structures therefor of semi-structured data, e.g. markup language structured data such as SGML, XML or HTML
- G06F16/83—Querying
- G06F16/835—Query processing
- G06F16/8358—Query translation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/80—Information retrieval; Database structures therefor; File system structures therefor of semi-structured data, e.g. markup language structured data such as SGML, XML or HTML
- G06F16/83—Querying
- G06F16/838—Presentation of query results
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/80—Information retrieval; Database structures therefor; File system structures therefor of semi-structured data, e.g. markup language structured data such as SGML, XML or HTML
- G06F16/84—Mapping; Conversion
Abstract
The invention discloses a method for rapidly establishing a full-text retrieval tool for common files, belonging to the field of retrieval tools. The method specifically comprises the steps that: (1) a document analysis module reads all the file analysis HTTP requests and sends the requests to a Chinese words segmentation module; (2) the Chinese words segmentation module segments attribute content in the received HTTP requests; (3) a full-text retrieval establishment module customizes a retrieval service type; (4) a retrieval module after analyzing a retrieval command performs corresponding operation and finishes establishment of the retrieval tool; (5) after a user submits search terms, the retrieval module performs word segmentation treatment on the search terms, generates an inquiry request and inquires in an index library and takes on the inquired result to the user. The method for rapidly establishing the full-text retrieval tool for common files realizes establishment of search engine dedicated to personnel and enterprises, personal retrieval requirements can be realized only by taking relatively little time and effort, and a plenty of internal files can be managed easily.
Description
Technical field
The present invention discloses a kind of method of rapid build gopher, belongs to gopher field, specifically a kind of method of rapid build active file full-text search instrument.
Background technology
Full-text search is by the arbitrary content information searching retrieval out in whole book of storage, entire article.It can to obtain in full the information such as relevant chapter, paragraph, sentence, word as required, is that is similar to and adds a label to each words of whole book, also can carry out various statistics and analysis.Solr is an independently enterprise-level search application server, and it externally provides the api interface being similar to Web-service.User can pass through http request, submits the XML file of certain format to, generating indexes to search engine server; Also can be operated by Http Get and propose search request, and obtain returning results of XML format.
The search need of now a lot of users also rests on the database stage, but when search mission charge capacity is very large, the performance of database also has limitation.And the search of content for a large amount of files, database can complete hardly, or the difficulty that complete process is suitable, and select a ripe search engine of increasing income as core, a gopher that can be user and use is built with this, it is good selection, but a practical text search tool builds very complicated, and there is no unified and simple construction method, the invention provides a kind of method of rapid build active file full-text search instrument, based on the active file gopher of the search engine solr that increases income, by file stored in search engine, structure full-text index is carried out to it, all related contents can be retrieved fast according to search keyword, finally present to user.Utilize the method, individual can be realized and enterprise builds exclusive search engine, only need spend less time and efforts, the Search Requirement of self can be reached, easily manage a large amount of internal files.
Summary of the invention
The present invention is directed to deficiency and the problem of prior art existence, a kind of method of rapid build active file full-text search instrument is provided, be applicable to individual and set up the gopher that can be retrieved the various file accumulated over a long period fast, be more suitable for enterprise and carry out managing internal heap file, file needed for fast searching.
The method of a kind of rapid build active file of the present invention full-text search instrument, the concrete scheme of proposition is:
A system for rapid build active file full-text search instrument, realize based on solr, comprise document parsing module, Chinese word segmentation module, full-text index sets up module, full-text index storehouse, retrieval module;
Document parsing module is responsible for resolution file;
Chinese word segmentation module in charge uses Chinese Word Automatic Segmentation, file content is carried out full text participle, to set up full-text index;
Full-text index is set up module in charge and is carried out full-text index to the word after Chinese word-dividing mode participle;
Full-text index storehouse is responsible for data and is stored;
Retrieval module is responsible for the various retrievals realizing user.
A method for rapid build active file full-text search instrument, realize based on solr, concrete steps are
1. document parsing module is converted into XML format after reading all document analysis, each document analysis is become two attributes, and composition HTTP request sends to Chinese word segmentation module;
2. Chinese word segmentation module carries out participle to the property content received in HTTP request, and set up module through full-text index after all properties participle and set up index, segmentation methods is configured by configuration file;
3. full-text index sets up Custom modules index service type, plans the field that will store and the field that will preserve, then the index of all foundation and data are stored into full-text index storehouse in configuration file;
4., after retrieval module is resolved retrieval command, from full-text index storehouse, obtain index, retrieve accordingly, delete, revise index operation, complete the structure of gopher;
5., after submit queries word, retrieval module can carry out the process such as participle to query word, and generated query request, then inquire about in index database, and inquiry acquired results is presented to user.
Described step 1. in two attributes becoming of each document analysis be the filename of file and the entire contents of file respectively, wherein filename comprises the absolute path that file stores.
Described step 2. in full-text index set up module and set up inverted data structure index.
Step 4. in after retrieval module resolves retrieval command, also can realize the sequence of result for retrieval, the highlighted display of keyword, search key weighted.
Described active file is word, pdf, txt form.
Usefulness of the present invention is: the active file gopher that the present invention is based on the search engine solr that increases income, by file stored in search engine, structure full-text index is carried out to it, all related contents can be retrieved fast according to search keyword, finally present to user, utilize this method, can realize individual and enterprise build exclusive search engine, only need spend less time and efforts, the Search Requirement of self can be reached, the internal file that easily management is a large amount of.
accompanying drawing illustrates:
The method flow schematic diagram of a kind of rapid build active file of Fig. 1 full-text search instrument.
Embodiment
By reference to the accompanying drawings to the present invention to further elaboration:
Embodiment 1
Based on search engine solr, build a kind of system of rapid build active file full-text search instrument, comprise document parsing module, Chinese word segmentation module, full-text index sets up module, full-text index storehouse, retrieval module; Chinese word segmentation module, full-text index sets up module, full-text index storehouse, and retrieval module works based on search engine solr;
Document parsing module is responsible for resolution file;
Chinese word segmentation module in charge uses Chinese Word Automatic Segmentation, file content is carried out full text participle, to set up full-text index;
Full-text index is set up module in charge and is carried out full-text index to the word after Chinese word-dividing mode participle;
Full-text index storehouse is responsible for data and is stored;
Retrieval module is responsible for the various retrievals realizing user.
A method for rapid build active file full-text search instrument, concrete steps are
1. document parsing module is converted into XML format after reading word document analysis, each document analysis is become two attributes, be the filename of file and the entire contents of file respectively, wherein filename comprises the absolute path that file stores, and composition HTTP request sends to Chinese word segmentation module;
2. Chinese word segmentation module carries out participle to the property content received in HTTP request, and set up module through full-text index after all properties participle and set up inverted data structure index, segmentation methods is configured by configuration file;
3. full-text index sets up Custom modules index service type, plans the field that will store and the field that will preserve, then the index of all foundation and data are stored into full-text index storehouse in configuration file;
4., after retrieval module is resolved retrieval command, from full-text index storehouse, obtain index, retrieve accordingly, delete, revise index operation, complete the structure of gopher;
5., after submit queries word, retrieval module can carry out the process such as participle to query word, and generated query request, then inquire about in index database, and inquiry acquired results is presented to user.
Embodiment 2
Based on search engine solr, build a kind of system of rapid build active file full-text search instrument, comprise document parsing module, Chinese word segmentation module, full-text index sets up module, full-text index storehouse, retrieval module; Chinese word segmentation module, full-text index sets up module, full-text index storehouse, and retrieval module works based on search engine solr;
Document parsing module is responsible for resolution file;
Chinese word segmentation module in charge uses Chinese Word Automatic Segmentation, file content is carried out full text participle, to set up full-text index;
Full-text index is set up module in charge and is carried out full-text index to the word after Chinese word-dividing mode participle;
Full-text index storehouse is responsible for data and is stored;
Retrieval module is responsible for the various retrievals realizing user.
A method for rapid build active file full-text search instrument, concrete steps are
1. document parsing module reads after pdf document is resolved and is converted into XML format, each document analysis is become two attributes, be the filename of file and the entire contents of file respectively, wherein filename comprises the absolute path that file stores, and composition HTTP request sends to Chinese word segmentation module;
2. Chinese word segmentation module carries out participle to the property content received in HTTP request, and set up module through full-text index after all properties participle and set up inverted data structure index, segmentation methods is configured by configuration file;
3. full-text index sets up Custom modules index service type, plans the field that will store and the field that will preserve, then the index of all foundation and data are stored into full-text index storehouse in configuration file;
4. after retrieval module is resolved retrieval command, index is obtained from full-text index storehouse, retrieve accordingly, delete, revise index operation, also can realize the sequence of result for retrieval, the highlighted display of keyword, search key weighted, complete the structure of gopher;
5., after submit queries word, retrieval module can carry out the process such as participle to query word, and generated query request, then inquire about in index database, and inquiry acquired results is presented to user.
Claims (6)
1. a system for rapid build active file full-text search instrument, realize based on solr, it is characterized in that comprising document parsing module, Chinese word segmentation module, full-text index sets up module, full-text index storehouse, retrieval module;
Document parsing module is responsible for resolution file;
Chinese word segmentation module in charge uses Chinese Word Automatic Segmentation, file content is carried out full text participle, to set up full-text index;
Full-text index is set up module in charge and is carried out full-text index to the word after Chinese word-dividing mode participle;
Full-text index storehouse is responsible for data and is stored;
Retrieval module is responsible for the various retrievals realizing user.
2. a method for rapid build active file full-text search instrument, utilizes the system of a kind of rapid build active file full-text search instrument as claimed in claim 1, it is characterized in that concrete steps are
1. document parsing module is converted into XML format after reading all document analysis, each document analysis is become two attributes, and composition HTTP request sends to Chinese word segmentation module;
2. Chinese word segmentation module carries out participle to the property content received in HTTP request, and set up module through full-text index after all properties participle and set up index, segmentation methods is configured by configuration file;
3. full-text index sets up Custom modules index service type, plans the field that will store and the field that will preserve, then the index of all foundation and data are stored into full-text index storehouse in configuration file;
4., after retrieval module is resolved retrieval command, from full-text index storehouse, obtain index, retrieve accordingly, delete, revise index operation, complete the structure of gopher;
5., after submit queries word, retrieval module can carry out the process such as participle to query word, and generated query request, then inquire about in index database, and inquiry acquired results is presented to user.
3. the method for a kind of rapid build active file full-text search instrument according to claim 2, it is characterized in that two attributes that during described step 1., each document analysis becomes are the filename of file and the entire contents of file respectively, wherein filename comprises the absolute path that file stores.
4. the method for a kind of rapid build active file full-text search instrument according to Claims 2 or 3, is characterized in that full-text index is set up module and set up inverted data structure index during described step 2..
5. the method for a kind of rapid build active file full-text search instrument according to claim 4, after it is characterized in that during step 4. that retrieval module resolves retrieval command, also can realize the sequence of result for retrieval, the highlighted display of keyword, search key weighted.
6. the method for a kind of rapid build active file full-text search instrument according to Claims 2 or 3 or 5 any one, is characterized in that described active file is word, pdf, txt form.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201410684418.9A CN104391941A (en) | 2014-11-25 | 2014-11-25 | Method for rapidly establishing full-text retrieval tool for common files |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201410684418.9A CN104391941A (en) | 2014-11-25 | 2014-11-25 | Method for rapidly establishing full-text retrieval tool for common files |
Publications (1)
Publication Number | Publication Date |
---|---|
CN104391941A true CN104391941A (en) | 2015-03-04 |
Family
ID=52609845
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201410684418.9A Pending CN104391941A (en) | 2014-11-25 | 2014-11-25 | Method for rapidly establishing full-text retrieval tool for common files |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN104391941A (en) |
Cited By (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106021625A (en) * | 2016-07-26 | 2016-10-12 | 浪潮软件集团有限公司 | Mixed application method of two word segmenters based on SOLR search engine |
CN106649529A (en) * | 2016-10-21 | 2017-05-10 | 天津海量信息技术股份有限公司 | Full-text retrieval method applied during transmission through HTTP protocol |
CN106649800A (en) * | 2016-12-29 | 2017-05-10 | 南威软件股份有限公司 | Solr-based Chinese search method |
CN106844700A (en) * | 2017-02-03 | 2017-06-13 | 山东浪潮商用系统有限公司 | It is a kind of to ask tax system based on Sorl |
CN106951419A (en) * | 2016-01-06 | 2017-07-14 | 北京仿真中心 | A kind of isomery manufacturing service of facing cloud manufacture finds system and method |
CN108255972A (en) * | 2017-12-27 | 2018-07-06 | 浪潮通用软件有限公司 | A kind of text searching method and system |
WO2020097997A1 (en) * | 2018-11-14 | 2020-05-22 | 山东大学 | Search result display method and device |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103488702A (en) * | 2013-09-06 | 2014-01-01 | 云南电力试验研究院(集团)有限公司电力研究院 | SorlCloud based unstructured data retrieval method and system |
CN103729463A (en) * | 2014-01-14 | 2014-04-16 | 赛特斯信息科技股份有限公司 | Method for implementing full-text retrieval based on Lucene and Solr |
CN103778202A (en) * | 2014-01-10 | 2014-05-07 | 江苏哲勤科技有限公司 | Enterprise electronic document managing server side and system |
-
2014
- 2014-11-25 CN CN201410684418.9A patent/CN104391941A/en active Pending
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103488702A (en) * | 2013-09-06 | 2014-01-01 | 云南电力试验研究院(集团)有限公司电力研究院 | SorlCloud based unstructured data retrieval method and system |
CN103778202A (en) * | 2014-01-10 | 2014-05-07 | 江苏哲勤科技有限公司 | Enterprise electronic document managing server side and system |
CN103729463A (en) * | 2014-01-14 | 2014-04-16 | 赛特斯信息科技股份有限公司 | Method for implementing full-text retrieval based on Lucene and Solr |
Cited By (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106951419A (en) * | 2016-01-06 | 2017-07-14 | 北京仿真中心 | A kind of isomery manufacturing service of facing cloud manufacture finds system and method |
CN106021625A (en) * | 2016-07-26 | 2016-10-12 | 浪潮软件集团有限公司 | Mixed application method of two word segmenters based on SOLR search engine |
CN106649529A (en) * | 2016-10-21 | 2017-05-10 | 天津海量信息技术股份有限公司 | Full-text retrieval method applied during transmission through HTTP protocol |
CN106649800A (en) * | 2016-12-29 | 2017-05-10 | 南威软件股份有限公司 | Solr-based Chinese search method |
CN106844700A (en) * | 2017-02-03 | 2017-06-13 | 山东浪潮商用系统有限公司 | It is a kind of to ask tax system based on Sorl |
CN108255972A (en) * | 2017-12-27 | 2018-07-06 | 浪潮通用软件有限公司 | A kind of text searching method and system |
WO2020097997A1 (en) * | 2018-11-14 | 2020-05-22 | 山东大学 | Search result display method and device |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN104391941A (en) | Method for rapidly establishing full-text retrieval tool for common files | |
CN103020281B (en) | A kind of data storage and retrieval method based on spatial data numerical index | |
CN103049575B (en) | A kind of academic conference search system of topic adaptation | |
CN101206670B (en) | System and method for transferring non construction information to content | |
CN102930060B (en) | A kind of method of database quick indexing and device | |
CN107038207A (en) | A kind of data query method, data processing method and device | |
CN101685444B (en) | System and method for realizing metadata search | |
US20150310129A1 (en) | Method of managing database, management computer and storage medium | |
US20160048584A1 (en) | On-the-fly determination of search areas and queries for database searches | |
US20090157801A1 (en) | System and method for integrating external system data in a visual mapping system | |
CN102193917A (en) | Method and device for processing and querying data | |
CN107085583B (en) | Electronic document management method and device based on content | |
US11216516B2 (en) | Method and system for scalable search using microservice and cloud based search with records indexes | |
CN102810114A (en) | Personal computer resource management system based on body | |
CN101196900A (en) | Information searching method based on metadata | |
US20130191328A1 (en) | Standardized framework for reporting archived legacy system data | |
CN102262650A (en) | Linked databases | |
EP2889788A1 (en) | Accessing information content in a database platform using metadata | |
CN110413570A (en) | A kind of document index and search method and its device | |
CN115563313A (en) | Knowledge graph-based document book semantic retrieval system | |
WO2021043088A1 (en) | File query method and device, and computer device and storage medium | |
CN103020300B (en) | Method and device for information retrieval | |
Liu et al. | A study of entity search in semantic search workshop | |
CN105740997A (en) | Method and device for controlling task flow, and database management system | |
Lu et al. | Language engineering for the Semantic Web: A digital library for endangered languages |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20150304 |