CN106933998B - Method for solving inaccurate Apache Solr phrase search - Google Patents
Method for solving inaccurate Apache Solr phrase search Download PDFInfo
- Publication number
- CN106933998B CN106933998B CN201710117467.8A CN201710117467A CN106933998B CN 106933998 B CN106933998 B CN 106933998B CN 201710117467 A CN201710117467 A CN 201710117467A CN 106933998 B CN106933998 B CN 106933998B
- Authority
- CN
- China
- Prior art keywords
- search
- phrases
- data
- phrase
- apache solr
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/33—Querying
- G06F16/3331—Query processing
- G06F16/334—Query execution
- G06F16/3344—Query execution using natural language analysis
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/33—Querying
- G06F16/332—Query formulation
- G06F16/3325—Reformulation based on results of preceding query
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- Computational Linguistics (AREA)
- Data Mining & Analysis (AREA)
- Databases & Information Systems (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Mathematical Physics (AREA)
- Artificial Intelligence (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention discloses a method for solving inaccurate Apache Solr phrase searching; the method is characterized in that: the method comprises the following steps: receiving data, wherein QParserpugin receives search statement parameters transmitted by a client through an http protocol; phrase searching, namely using a regular expression to match phrases in sentence parameters in QParserpugin to obtain a phrase set; data word segmentation and replacement, namely performing index mode word segmentation on the phrases in the phrase set obtained in the step 2; replacing the phrases in the original search sentence with the phrases of the divided words; converting the data, namely converting the replaced search statement into Query through a syntax parser of the Apache Solr; and (5) processing and outputting the data, entering a search process of Apache Solr, and outputting the data after the search is finished. The invention adopts a plug-in mode to extend the grammar parser of the Apache Solr, rewrites the grammar parsing rule thereof and solves the problem of inaccurate phrase search. A pluggable parser extension plug-in is arranged; and carrying out index mode word segmentation on the phrases by adopting an index mode and then searching.
Description
Technical Field
The invention relates to the technical field of network search, in particular to a method for solving inaccurate Apache Solr phrase search.
Background
One search syntax in Apache Solr is called "phrase search". The phrase search is PhraseQuery; the grammar of phrase search is to add quotation marks on the keywords, and the search principle is that the distance of the keywords in the quotation marks after word segmentation is the size of the specified slop parameter; but the result of word segmentation of the document during indexing is more than that of word segmentation of Query during searching, so that the indexing mode and the searching mode are not matched, and the problem of inaccurate 'phrase searching' is caused.
The invention provides a method, before Apache Solr search operation, the keywords in phrase search grammar are segmented according to an index mode, then original phrase search sentences are replaced, and finally search operation is carried out.
Disclosure of Invention
The technical problem to be solved by the invention is that Apache Solr searches inaccurately due to inconsistent word segmentation results of an index mode and a search mode when searching phrases.
In order to solve the technical problems, the invention adopts the following technical means:
a method for solving inaccurate Apache Solr phrase searching; the method is characterized in that: the method comprises the following steps:
step 1: receiving data, wherein QParserpugin receives search statement parameters transmitted by a client through an http protocol;
step 2: phrase searching, namely using a regular expression to match phrases in sentence parameters in QParserpugin to obtain a phrase set;
and step 3: data word segmentation and replacement, namely performing index mode word segmentation on the phrases in the phrase set obtained in the step 2; replacing the phrases in the original search sentence with the phrases of the divided words;
and 4, step 4: converting the data, namely converting the replaced search statement into Query through a syntax parser of the Apache Solr;
and 5: and (5) processing and outputting the data, entering a search process of Apache Solr, and outputting the data after the search is finished.
Preferably, the further technical scheme of the invention is as follows:
in the phrase searching, in the parse method, firstly, a getString method is called to obtain a search statement, and then a regular expression matched with a quotation mark adding statement is used for matching the phrase search statement in the search statement.
And the data word segmentation and replacement are carried out, a word segmentation device is called to segment matched phrases according to an index mode, and finally the original search sentence is replaced by the sentence after word segmentation.
The syntax parser of the data conversion Apache Solr writes an AntfactQParersPLugin class, inherits the QParersPLugin of the Apache Solr and rewrites a createParser method, and the return value is the AntfactQParersser type.
The syntax parser of the data conversion Apache Solr finally converts the data into solrconfig.xml configuration file with < queryParser >, and class is AntfactQPareserPlugin; therefore, the self-defined queryParser can be dynamically and flexibly configured.
The invention adopts a plug-in mode to extend the grammar parser of the Apache Solr, rewrites the grammar parsing rule thereof and solves the problem of inaccurate phrase search. A pluggable parser extension plug-in is arranged; and carrying out index mode word segmentation on the phrases by adopting an index mode and then searching.
Writing an AntfactQParer class, inheriting a LuceneQParer class of Apache Solr and rewriting a pare method; in the method of the parse, firstly, calling a getString method to obtain a search statement, then using a regular expression matched with a quotation mark statement to match a phrase search statement in the search statement, then calling a participle device to participle the matched phrase according to an index mode, and finally replacing the participled statement with an original search statement; therefore, the function of the original LuceneQParers of the Apache Solr is not influenced, and grammar parsing rules can be customized according to requirements;
writing an AntfactQParerPlugin class, inheriting the QParerPlugin of Apache Solr and rewriting a createParser method, wherein the returned value is the AntfactQParerType type; finally configuring < queryParser > in a solrconfig.xml configuration file, wherein class is AntfactQPareserPlugin; therefore, the self-defined queryParser can be dynamically and flexibly configured.
Drawings
FIG. 1 is a flow chart of a method for resolving the inaccuracy of Apache Solr phrase searching in accordance with the present invention.
FIG. 2 is a block diagram illustrating a method for resolving inaccurate Apache Solr phrase searches according to the present invention.
Detailed Description
The present invention will be further described with reference to the following examples.
Specific example 1:
referring to fig. 1 and fig. 2, it can be known that the method for solving inaccurate search of apache solr phrases is disclosed in the present invention; the method is characterized in that: the method comprises the following steps: step 1: receiving data, wherein QParserpugin receives search statement parameters transmitted by a client through an http protocol; step 2: phrase searching, namely using a regular expression to match phrases in sentence parameters in QParserpugin to obtain a phrase set; in the method of parse, firstly, a getString method is called to obtain a search statement, then a regular expression matched with a quotation mark statement is used for matching a phrase search statement in the search statement, and step 3: data word segmentation and replacement, namely performing index mode word segmentation on the phrases in the phrase set obtained in the step 2; replacing the phrases in the original search sentence with the phrases of the divided words; calling a word segmentation device to segment the matched phrases according to the index mode, and finally replacing the original search sentence with the segmented sentence, wherein the step 4 is as follows: converting the data, namely converting the replaced search statement into Query through a syntax parser of the Apache Solr; the syntax parser of the data conversion Apache Solr compiles an AntfactQPareserPlugin class, inherits the QPareserPlugin of the Apache Solr and rewrites a createParser method, a return value is an AntfactQPareser type, the syntax parser of the data conversion Apache Solr finally converts data into solrconfig.xml configuration files and configures < queryParser >, and class is AntfactQPareserPlugin; thus, the self-defined queryParser can be dynamically and flexibly configured, and the step 5: and (5) processing and outputting the data, entering a search process of Apache Solr, and outputting the data after the search is finished.
Specific example 2:
1. the method comprises the steps that firstly, QParserpugin receives search statement parameters transmitted by a client through an http protocol;
2. secondly, matching phrases in sentence parameters by using a regular expression in QParserpugin to obtain a phrase set;
3. thirdly, traversing phrases in the phrase set to perform index mode word segmentation;
4. fourthly, replacing the phrases in the original search sentence with the phrases of the divided words;
5. fifthly, a syntax parser of the Apache Solr converts the replaced search statement into Query.
6. Sixthly, entering a search process of Apache Solr;
since the above description is only a specific embodiment of the present invention, but the protection of the present invention is not limited thereto, any equivalent changes or substitutions of the technical features of the present invention which can be conceived by those skilled in the art are included in the protection scope of the present invention.
Claims (2)
1. A method for solving inaccurate Apache Solr phrase searching; the method is characterized in that: the method comprises the following steps:
step 1: receiving data, wherein QParserpugin receives search statement parameters transmitted by a client through an http protocol;
step 2: phrase searching, namely using a regular expression to match phrases in sentence parameters in QParserpugin to obtain a phrase set; in the method of parse, firstly, a getString method is called to obtain a search statement, and then a regular expression matched with a 'quote adding statement' is used for matching a 'phrase search statement' in the search statement;
and step 3: data word segmentation and replacement, namely performing index mode word segmentation on the phrases in the phrase set obtained in the step 2; replacing the phrases in the original search sentence with the phrases of the divided words; data word segmentation and replacement, calling a word segmentation device to segment matched phrases according to an index mode, finally replacing original search sentences with the segmented sentences, and finally converting data into solrcig.xml configuration files by a syntax parser of data conversion Apache Solr to configure < queryParser >, and class is AntfactQPareserPlugin; therefore, the self-defined queryParser can be dynamically and flexibly configured;
and 4, step 4: converting the data, namely converting the replaced search statement into Query through a syntax parser of the Apache Solr; the syntax parser of the data conversion Apache Solr compiles an AntfactQParerPlugin class, inherits the QParerPlugin of the Apache Solr and rewrites a createParser method, and the return value is the AntfactQParerTyper class;
and 5: and (5) processing and outputting the data, entering a search process of Apache Solr, and outputting the data after the search is finished.
2. A method of resolving apache solr phrase inaccuracies in search according to claim 1; the method is characterized in that: the plug-in for solving inaccurate search of Apache Solr phrases is that < queryParser > is configured in the solrconfig.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710117467.8A CN106933998B (en) | 2017-03-01 | 2017-03-01 | Method for solving inaccurate Apache Solr phrase search |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710117467.8A CN106933998B (en) | 2017-03-01 | 2017-03-01 | Method for solving inaccurate Apache Solr phrase search |
Publications (2)
Publication Number | Publication Date |
---|---|
CN106933998A CN106933998A (en) | 2017-07-07 |
CN106933998B true CN106933998B (en) | 2021-03-02 |
Family
ID=59423888
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201710117467.8A Active CN106933998B (en) | 2017-03-01 | 2017-03-01 | Method for solving inaccurate Apache Solr phrase search |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN106933998B (en) |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102682036A (en) * | 2011-03-18 | 2012-09-19 | 新奥特(北京)视频技术有限公司 | Non-editing based method and system for searching media assets |
CN103488702A (en) * | 2013-09-06 | 2014-01-01 | 云南电力试验研究院(集团)有限公司电力研究院 | SorlCloud based unstructured data retrieval method and system |
-
2017
- 2017-03-01 CN CN201710117467.8A patent/CN106933998B/en active Active
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102682036A (en) * | 2011-03-18 | 2012-09-19 | 新奥特(北京)视频技术有限公司 | Non-editing based method and system for searching media assets |
CN103488702A (en) * | 2013-09-06 | 2014-01-01 | 云南电力试验研究院(集团)有限公司电力研究院 | SorlCloud based unstructured data retrieval method and system |
Non-Patent Citations (1)
Title |
---|
Solr实现Low Level查询解析;作者不详;《程序园 https://www.cnblogs.com/lvfeilong/p/fsdf32erwrf.html》;20111103;第1-10页 * |
Also Published As
Publication number | Publication date |
---|---|
CN106933998A (en) | 2017-07-07 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
TWI636452B (en) | Method and system of voice recognition | |
CN109542929B (en) | Voice query method and device and electronic equipment | |
CN102682763B (en) | Method, device and terminal for correcting named entity vocabularies in voice input text | |
CN111177184A (en) | Structured query language conversion method based on natural language and related equipment thereof | |
CN104657440B (en) | Structured query statement generation system and method | |
CN102567384B (en) | Webpage multi-language dynamic switching method and system based on webpage browser engine | |
CN111104423B (en) | SQL statement generation method and device, electronic equipment and storage medium | |
US9110852B1 (en) | Methods and systems for extracting information from text | |
CN110554875B (en) | Code conversion method and device, electronic equipment and storage medium | |
CN111176650B (en) | Parser generation method, search method, server, and storage medium | |
JP2017510924A5 (en) | ||
CN104809106A (en) | System and method for excavating patent schemes | |
CN109446221A (en) | A kind of interactive data method for surveying based on semantic analysis | |
CN107038163A (en) | A kind of text semantic modeling method towards magnanimity internet information | |
CN109558166A (en) | A kind of code search method of facing defects positioning | |
CN111666372B (en) | Method, device, electronic equipment and readable storage medium for analyzing query word query | |
CN113779062A (en) | SQL statement generation method and device, storage medium and electronic equipment | |
CN108766507A (en) | A kind of clinical quality index calculating method based on CQL Yu standard information model openEHR | |
JP2016099675A (en) | Translation learning device, translation device, unique expression learning device, method, and program | |
JP2016164707A (en) | Automatic translation device and translation model learning device | |
CN106933998B (en) | Method for solving inaccurate Apache Solr phrase search | |
JP2013054607A (en) | Rearrangement rule learning device, method and program, and translation device, method and program | |
KR101802051B1 (en) | Method and system for constructing schema on natural language processing and knowledge database thereof | |
WO2016131295A1 (en) | Northbound data conversion method and device | |
CN103064967A (en) | Method and device used for establishing user binary relation bases |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |