CN106933998A - A kind of inaccurate method of solution ApacheSolr phrase searches - Google Patents

A kind of inaccurate method of solution ApacheSolr phrase searches Download PDF

Info

Publication number
CN106933998A
CN106933998A CN201710117467.8A CN201710117467A CN106933998A CN 106933998 A CN106933998 A CN 106933998A CN 201710117467 A CN201710117467 A CN 201710117467A CN 106933998 A CN106933998 A CN 106933998A
Authority
CN
China
Prior art keywords
phrase
search
participle
solution
apachesolr
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201710117467.8A
Other languages
Chinese (zh)
Other versions
CN106933998B (en
Inventor
何小成
黄三伟
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hunan Ant Software Ltd By Share Ltd
Original Assignee
Hunan Ant Software Ltd By Share Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hunan Ant Software Ltd By Share Ltd filed Critical Hunan Ant Software Ltd By Share Ltd
Priority to CN201710117467.8A priority Critical patent/CN106933998B/en
Publication of CN106933998A publication Critical patent/CN106933998A/en
Application granted granted Critical
Publication of CN106933998B publication Critical patent/CN106933998B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/334Query execution
    • G06F16/3344Query execution using natural language analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/332Query formulation
    • G06F16/3325Reformulation based on results of preceding query

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Artificial Intelligence (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a kind of method that solution ApacheSolr phrase searches are forbidden;It is characterized in that:The method comprises the following steps:Data receiver, QParserPlugin receives the search statement parameter that client is transmitted by http agreements;Phrase is searched, the phrase in using matching regular expressions search statement parameter in QParserPlugin, obtains phrase set;Data participle and replacement, pattern participle is indexed by the phrase in the phrase set that step 2 is obtained;The phrase of point good word is replaced into the phrase in initial search sentence;Data conversion, the search statement that will be replaced by the grammar parser of Apache Solr changes into Query;Data processing and output, into the search procedure of Apache Solr, after the completion of output data.The present invention extends the grammar parser of Apache Solr by the way of plug-in unit, rewrites its syntax parsing rule, solves the problems, such as that phrase search is forbidden.There is provided the grammar parser expansion plugin of plug type;Searched again for after pattern participle is indexed to phrase using indexing model.

Description

A kind of inaccurate method of solution ApacheSolr phrase searches
Technical field
It is exactly that a kind of solution ApacheSolr phrase searches are forbidden the present invention relates to web search technical field Method.
Background technology
There is a kind of search grammer in Apache Solr " phrase search " both PhraseQuery;The grammer of phrase search is Quotation marks are added on keyword, search principle is that distance is the slop parameter sizes specified after the keyword participle in quotation marks;But Be the document participle when indexing result can than search when it is more to the result of Query participles, therefore can cause indexing model with Search pattern is mismatched, so as to cause " phrase search " inaccurate problem.
The invention provides a kind of method, before Apache Solr search operations are entered, first by phrase search grammer Keyword according to indexing model participle, then replace original phrase search sentence, finally enter search operation.
The content of the invention
The technical problem to be solved in the present invention be Apache Solr in phrase search, because of indexing model and search pattern Word segmentation result is inconsistent so as to the problem for causing search inaccurate.
In order to solve the above technical problems, the present invention uses following technological means:
A kind of inaccurate method of solution ApacheSolr phrase searches;It is characterized in that:The method comprises the following steps:
Step 1:Data receiver, QParserPlugin receives the search statement that client is transmitted by http agreements Parameter;
Step 2:Phrase is searched, short in using matching regular expressions search statement parameter in QParserPlugin Language, obtains phrase set;
Step 3:Data participle and replacement, pattern participle is indexed by the phrase in the phrase set that step 2 is obtained; The phrase of point good word is replaced into the phrase in initial search sentence;
Step 4:Data conversion, the search statement that will be replaced by the grammar parser of Apache Solr is changed into Query;
Step 5:Data processing and output, into the search procedure of Apache Solr, after the completion of output data.
Preferably, the present invention further technical scheme is:
Described phrase is searched, and calls getString methods to obtain search statement first in parse methods, is then made Gone to match " the phrase search sentence " in search statement with the regular expression of matching " plus quotation marks sentence ".
Described data participle and replacement, call segmenter by indexing model to matching the phrase participle for coming, and finally will Sentence after participle replaces original search statement.
The grammar parser of described data conversion Apache Solr, writes AntfactQParserPlugin classes, and after Hold the QParserPlugin of Apache Solr and rewrite createParser methods, return value is AntfactQParser classes Type.
The last change data of grammar parser of described data conversion Apache Solr is configured for solrconfig.xml Configured in file<queryParser>, class is AntfactQParserPlugin;So can be with the configuration of dynamic flexible certainly The queryParser of definition.
The present invention extends the grammar parser of Apache Solr by the way of plug-in unit, rewrites its syntax parsing rule, solution The inaccurate problem of phrase search of having determined.There is provided the grammar parser expansion plugin of plug type;Using indexing model to phrase Searched again for after being indexed pattern participle.
First, AntfactQParser classes are write, and is inherited the LuceneQParser classes of Apache Solr and is rewritten parse Method;Call getString methods to obtain search statement first in parse methods, then use matching " plus quotation marks sentence " Regular expression go match search statement in " phrase search sentence ", then call segmenter by indexing model to matching The phrase participle for coming, finally replaces original search statement by the sentence after participle;So neither influence original Apache Solr LuceneQParser function, again can self-defined syntax parsing rule according to demand;
2nd, AntfactQParserPlugin classes are write, and is inherited the QParserPlugin of Apache Solr and is rewritten CreateParser methods, return value is AntfactQParser types;Finally match somebody with somebody in solrconfig.xml configuration files Put<queryParser>, class is AntfactQParserPlugin;So can be customized with the configuration of dynamic flexible queryParser。
Brief description of the drawings
Fig. 1 is a kind of inaccurate method flow diagram of solution ApacheSolr phrase searches of the invention.
Fig. 2 is a kind of inaccurate method structured flowchart of solution ApacheSolr phrase searches of the invention.
Specific embodiment
With reference to embodiment, the present invention is further illustrated.
Specific embodiment 1:
It can be seen from Fig. 1, Fig. 2, a kind of inaccurate method of solution ApacheSolr phrase searches of the present invention;Its feature exists In:The method comprises the following steps:Step 1:Data receiver, QParserPlugin is received client and is transmitted by http agreements The search statement parameter for coming over;Step 2:Phrase is searched, and matching regular expressions search statement is used in QParserPlugin Phrase in parameter, obtains phrase set;GetString methods are called to obtain search statement first in parse methods, then Go to match " the phrase search sentence " in search statement, step 3 using the regular expression of matching " plus quotation marks sentence ":Data point Word and replacement, pattern participle is indexed by the phrase in the phrase set that step 2 is obtained;The phrase of point good word is replaced former Phrase in beginning search statement;Call segmenter by indexing model to matching the phrase participle for coming, finally by the language after participle Sentence replaces original search statement, step 4:Data conversion, the search that will be replaced by the grammar parser of Apache Solr Sentence changes into Query;The grammar parser of data conversion Apache Solr, writes AntfactQParserPlugin classes, and after Hold the QParserPlugin of Apache Solr and rewrite createParser methods, return value is AntfactQParser classes Type, the last change data of grammar parser of data conversion Apache Solr is configuration in solrconfig.xml configuration files< queryParser>, class is AntfactQParserPlugin;So can be customized with the configuration of dynamic flexible QueryParser, step 5:Data processing and output, into the search procedure of Apache Solr, after the completion of output data.
Specific embodiment 2:
1st, the first step, QParserPlugin receives the search statement parameter that client is transmitted by http agreements;
2nd, second step, the phrase in using matching regular expressions search statement parameter in QParserPlugin, obtains Phrase set;
3rd, the 3rd step, the phrase in traversal phrase set is indexed pattern participle;
4th, the 4th step, the phrase in initial search sentence is replaced by the phrase of point good word;
5th, the 5th step, the search statement that the grammar parser of Apache Solr will be replaced changes into Query.
6th, the 6th step, into the search procedure of Apache Solr;
Due to the foregoing is only specific embodiment of the invention, but protection not limited to this of the invention, any skill The technical staff in art field it is contemplated that equivalent change or the replacement of the technical program technical characteristic, all cover of the invention Within protection domain.

Claims (6)

1. a kind of method that solution ApacheSolr phrase searches are forbidden;It is characterized in that:The method comprises the following steps:
Step 1:Data receiver, QParserPlugin receives the search statement parameter that client is transmitted by http agreements;
Step 2:Phrase is searched, the phrase in using matching regular expressions search statement parameter in QParserPlugin, is obtained To phrase set;
Step 3:Data participle and replacement, pattern participle is indexed by the phrase in the phrase set that step 2 is obtained;To divide The phrase of good word replaces the phrase in initial search sentence;
Step 4:Data conversion, the search statement that will be replaced by the grammar parser of Apache Solr changes into Query;
Step 5:Data processing and output, into the search procedure of Apache Solr, after the completion of output data.
2. the method that a kind of solution ApacheSolr phrase searches according to claim 1 are forbidden;It is characterized in that:It is described Phrase search, call getString methods to obtain search statement first in parse methods, then use matching " plus quotation marks The regular expression of sentence " goes to match " the phrase search sentence " in search statement.
3. the method that a kind of solution ApacheSolr phrase searches according to claim 1 are forbidden;It is characterized in that:It is described Data participle and replacement, call segmenter by indexing model to match come phrase participle, finally by the sentence after participle Replace original search statement.
4. the method that a kind of solution ApacheSolr phrase searches according to claim 1 are forbidden;It is characterized in that:It is described Data conversion Apache Solr grammar parser, write AntfactQParserPlugin classes, and inherit Apache The QParserPlugin of Solr simultaneously rewrites createParser methods, and return value is AntfactQParser types.
5. a kind of inaccurate method of solution ApacheSolr phrase searches according to claim 1 or 3;It is characterized in that: The last change data of grammar parser of described data conversion Apache Solr be solrconfig.xml configuration files in match somebody with somebody Put<queryParser>, class is AntfactQParserPlugin;So can be customized with the configuration of dynamic flexible queryParser。
6. the method that a kind of solution ApacheSolr phrase searches according to claim 1 are forbidden;It is characterized in that:It is described The inaccurate plug-in unit of solution Apache Solr phrase searches, be exactly the solrconfig.xml in Apache solr, configuration text Configured in part<queryParser>.
CN201710117467.8A 2017-03-01 2017-03-01 Method for solving inaccurate Apache Solr phrase search Active CN106933998B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710117467.8A CN106933998B (en) 2017-03-01 2017-03-01 Method for solving inaccurate Apache Solr phrase search

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710117467.8A CN106933998B (en) 2017-03-01 2017-03-01 Method for solving inaccurate Apache Solr phrase search

Publications (2)

Publication Number Publication Date
CN106933998A true CN106933998A (en) 2017-07-07
CN106933998B CN106933998B (en) 2021-03-02

Family

ID=59423888

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710117467.8A Active CN106933998B (en) 2017-03-01 2017-03-01 Method for solving inaccurate Apache Solr phrase search

Country Status (1)

Country Link
CN (1) CN106933998B (en)

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102682036A (en) * 2011-03-18 2012-09-19 新奥特(北京)视频技术有限公司 Non-editing based method and system for searching media assets
CN103488702A (en) * 2013-09-06 2014-01-01 云南电力试验研究院(集团)有限公司电力研究院 SorlCloud based unstructured data retrieval method and system

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102682036A (en) * 2011-03-18 2012-09-19 新奥特(北京)视频技术有限公司 Non-editing based method and system for searching media assets
CN103488702A (en) * 2013-09-06 2014-01-01 云南电力试验研究院(集团)有限公司电力研究院 SorlCloud based unstructured data retrieval method and system

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
作者不详: "Solr实现Low Level查询解析", 《程序园 HTTPS://WWW.CNBLOGS.COM/LVFEILONG/P/FSDF32ERWRF.HTML》 *

Also Published As

Publication number Publication date
CN106933998B (en) 2021-03-02

Similar Documents

Publication Publication Date Title
US9342301B2 (en) Converting and input script to a natural language description
US10534830B2 (en) Dynamically updating a running page
US11138005B2 (en) Methods and systems for automatically generating documentation for software
US20140350913A1 (en) Translation device and method
US20060282453A1 (en) Methods and systems for transforming an and/or command tree into a command data model
US20070006196A1 (en) Methods and systems for extracting information from computer code
US20070006179A1 (en) Methods and systems for transforming a parse graph into an and/or command tree
CN103942137A (en) Browser compatibility testing method and device
CN113051285A (en) SQL statement conversion method, system, equipment and storage medium
CN111831384A (en) Language switching method and device, equipment and storage medium
US11403078B2 (en) Interface layout interference detection
JP2021111327A (en) Method for generating api knowledge graph, system, and non-transitory computer-readable medium
CN113254023B (en) Object reading method and device and electronic equipment
US20080243904A1 (en) Methods and apparatus for storing XML data in relations
CN108509187B (en) Method and system for automatically generating MIB function code of software platform
CN106326314B (en) Webpage information extraction method and device
CN106933998A (en) A kind of inaccurate method of solution ApacheSolr phrase searches
TWI643077B (en) Method and adjustment device for adaptively adjusting database structure
CN112733517B (en) Method for checking requirement template conformity, electronic equipment and storage medium
CN116362219A (en) Information extraction template generation method and device, medium and equipment
CN113608748B (en) Data processing method, device and equipment for converting C language into Java language
KR100921563B1 (en) Method of sentence compression using the dependency grammar parse tree
CA2964481C (en) Systems and methods for normalized schema comparison
JP2015090622A (en) Shortened sentence generation device, method, and program
KR100691261B1 (en) System and method for supporting xquery update language

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant