CN106933998B - Method for solving inaccurate Apache Solr phrase search - Google Patents

Method for solving inaccurate Apache Solr phrase search Download PDF

Info

Publication number
CN106933998B
CN106933998B CN201710117467.8A CN201710117467A CN106933998B CN 106933998 B CN106933998 B CN 106933998B CN 201710117467 A CN201710117467 A CN 201710117467A CN 106933998 B CN106933998 B CN 106933998B
Authority
CN
China
Prior art keywords
search
phrases
data
phrase
apache solr
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201710117467.8A
Other languages
Chinese (zh)
Other versions
CN106933998A (en
Inventor
何小成
黄三伟
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hunan Eefung Software Co ltd
Original Assignee
Hunan Eefung Software Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hunan Eefung Software Co ltd filed Critical Hunan Eefung Software Co ltd
Priority to CN201710117467.8A priority Critical patent/CN106933998B/en
Publication of CN106933998A publication Critical patent/CN106933998A/en
Application granted granted Critical
Publication of CN106933998B publication Critical patent/CN106933998B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/334Query execution
    • G06F16/3344Query execution using natural language analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/332Query formulation
    • G06F16/3325Reformulation based on results of preceding query

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Artificial Intelligence (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a method for solving inaccurate Apache Solr phrase searching; the method is characterized in that: the method comprises the following steps: receiving data, wherein QParserpugin receives search statement parameters transmitted by a client through an http protocol; phrase searching, namely using a regular expression to match phrases in sentence parameters in QParserpugin to obtain a phrase set; data word segmentation and replacement, namely performing index mode word segmentation on the phrases in the phrase set obtained in the step 2; replacing the phrases in the original search sentence with the phrases of the divided words; converting the data, namely converting the replaced search statement into Query through a syntax parser of the Apache Solr; and (5) processing and outputting the data, entering a search process of Apache Solr, and outputting the data after the search is finished. The invention adopts a plug-in mode to extend the grammar parser of the Apache Solr, rewrites the grammar parsing rule thereof and solves the problem of inaccurate phrase search. A pluggable parser extension plug-in is arranged; and carrying out index mode word segmentation on the phrases by adopting an index mode and then searching.

Description

Method for solving inaccurate Apache Solr phrase search
Technical Field
The invention relates to the technical field of network search, in particular to a method for solving inaccurate Apache Solr phrase search.
Background
One search syntax in Apache Solr is called "phrase search". The phrase search is PhraseQuery; the grammar of phrase search is to add quotation marks on the keywords, and the search principle is that the distance of the keywords in the quotation marks after word segmentation is the size of the specified slop parameter; but the result of word segmentation of the document during indexing is more than that of word segmentation of Query during searching, so that the indexing mode and the searching mode are not matched, and the problem of inaccurate 'phrase searching' is caused.
The invention provides a method, before Apache Solr search operation, the keywords in phrase search grammar are segmented according to an index mode, then original phrase search sentences are replaced, and finally search operation is carried out.
Disclosure of Invention
The technical problem to be solved by the invention is that Apache Solr searches inaccurately due to inconsistent word segmentation results of an index mode and a search mode when searching phrases.
In order to solve the technical problems, the invention adopts the following technical means:
a method for solving inaccurate Apache Solr phrase searching; the method is characterized in that: the method comprises the following steps:
step 1: receiving data, wherein QParserpugin receives search statement parameters transmitted by a client through an http protocol;
step 2: phrase searching, namely using a regular expression to match phrases in sentence parameters in QParserpugin to obtain a phrase set;
and step 3: data word segmentation and replacement, namely performing index mode word segmentation on the phrases in the phrase set obtained in the step 2; replacing the phrases in the original search sentence with the phrases of the divided words;
and 4, step 4: converting the data, namely converting the replaced search statement into Query through a syntax parser of the Apache Solr;
and 5: and (5) processing and outputting the data, entering a search process of Apache Solr, and outputting the data after the search is finished.
Preferably, the further technical scheme of the invention is as follows:
in the phrase searching, in the parse method, firstly, a getString method is called to obtain a search statement, and then a regular expression matched with a quotation mark adding statement is used for matching the phrase search statement in the search statement.
And the data word segmentation and replacement are carried out, a word segmentation device is called to segment matched phrases according to an index mode, and finally the original search sentence is replaced by the sentence after word segmentation.
The syntax parser of the data conversion Apache Solr writes an AntfactQParersPLugin class, inherits the QParersPLugin of the Apache Solr and rewrites a createParser method, and the return value is the AntfactQParersser type.
The syntax parser of the data conversion Apache Solr finally converts the data into solrconfig.xml configuration file with < queryParser >, and class is AntfactQPareserPlugin; therefore, the self-defined queryParser can be dynamically and flexibly configured.
The invention adopts a plug-in mode to extend the grammar parser of the Apache Solr, rewrites the grammar parsing rule thereof and solves the problem of inaccurate phrase search. A pluggable parser extension plug-in is arranged; and carrying out index mode word segmentation on the phrases by adopting an index mode and then searching.
Writing an AntfactQParer class, inheriting a LuceneQParer class of Apache Solr and rewriting a pare method; in the method of the parse, firstly, calling a getString method to obtain a search statement, then using a regular expression matched with a quotation mark statement to match a phrase search statement in the search statement, then calling a participle device to participle the matched phrase according to an index mode, and finally replacing the participled statement with an original search statement; therefore, the function of the original LuceneQParers of the Apache Solr is not influenced, and grammar parsing rules can be customized according to requirements;
writing an AntfactQParerPlugin class, inheriting the QParerPlugin of Apache Solr and rewriting a createParser method, wherein the returned value is the AntfactQParerType type; finally configuring < queryParser > in a solrconfig.xml configuration file, wherein class is AntfactQPareserPlugin; therefore, the self-defined queryParser can be dynamically and flexibly configured.
Drawings
FIG. 1 is a flow chart of a method for resolving the inaccuracy of Apache Solr phrase searching in accordance with the present invention.
FIG. 2 is a block diagram illustrating a method for resolving inaccurate Apache Solr phrase searches according to the present invention.
Detailed Description
The present invention will be further described with reference to the following examples.
Specific example 1:
referring to fig. 1 and fig. 2, it can be known that the method for solving inaccurate search of apache solr phrases is disclosed in the present invention; the method is characterized in that: the method comprises the following steps: step 1: receiving data, wherein QParserpugin receives search statement parameters transmitted by a client through an http protocol; step 2: phrase searching, namely using a regular expression to match phrases in sentence parameters in QParserpugin to obtain a phrase set; in the method of parse, firstly, a getString method is called to obtain a search statement, then a regular expression matched with a quotation mark statement is used for matching a phrase search statement in the search statement, and step 3: data word segmentation and replacement, namely performing index mode word segmentation on the phrases in the phrase set obtained in the step 2; replacing the phrases in the original search sentence with the phrases of the divided words; calling a word segmentation device to segment the matched phrases according to the index mode, and finally replacing the original search sentence with the segmented sentence, wherein the step 4 is as follows: converting the data, namely converting the replaced search statement into Query through a syntax parser of the Apache Solr; the syntax parser of the data conversion Apache Solr compiles an AntfactQPareserPlugin class, inherits the QPareserPlugin of the Apache Solr and rewrites a createParser method, a return value is an AntfactQPareser type, the syntax parser of the data conversion Apache Solr finally converts data into solrconfig.xml configuration files and configures < queryParser >, and class is AntfactQPareserPlugin; thus, the self-defined queryParser can be dynamically and flexibly configured, and the step 5: and (5) processing and outputting the data, entering a search process of Apache Solr, and outputting the data after the search is finished.
Specific example 2:
1. the method comprises the steps that firstly, QParserpugin receives search statement parameters transmitted by a client through an http protocol;
2. secondly, matching phrases in sentence parameters by using a regular expression in QParserpugin to obtain a phrase set;
3. thirdly, traversing phrases in the phrase set to perform index mode word segmentation;
4. fourthly, replacing the phrases in the original search sentence with the phrases of the divided words;
5. fifthly, a syntax parser of the Apache Solr converts the replaced search statement into Query.
6. Sixthly, entering a search process of Apache Solr;
since the above description is only a specific embodiment of the present invention, but the protection of the present invention is not limited thereto, any equivalent changes or substitutions of the technical features of the present invention which can be conceived by those skilled in the art are included in the protection scope of the present invention.

Claims (2)

1. A method for solving inaccurate Apache Solr phrase searching; the method is characterized in that: the method comprises the following steps:
step 1: receiving data, wherein QParserpugin receives search statement parameters transmitted by a client through an http protocol;
step 2: phrase searching, namely using a regular expression to match phrases in sentence parameters in QParserpugin to obtain a phrase set; in the method of parse, firstly, a getString method is called to obtain a search statement, and then a regular expression matched with a 'quote adding statement' is used for matching a 'phrase search statement' in the search statement;
and step 3: data word segmentation and replacement, namely performing index mode word segmentation on the phrases in the phrase set obtained in the step 2; replacing the phrases in the original search sentence with the phrases of the divided words; data word segmentation and replacement, calling a word segmentation device to segment matched phrases according to an index mode, finally replacing original search sentences with the segmented sentences, and finally converting data into solrcig.xml configuration files by a syntax parser of data conversion Apache Solr to configure < queryParser >, and class is AntfactQPareserPlugin; therefore, the self-defined queryParser can be dynamically and flexibly configured;
and 4, step 4: converting the data, namely converting the replaced search statement into Query through a syntax parser of the Apache Solr; the syntax parser of the data conversion Apache Solr compiles an AntfactQParerPlugin class, inherits the QParerPlugin of the Apache Solr and rewrites a createParser method, and the return value is the AntfactQParerTyper class;
and 5: and (5) processing and outputting the data, entering a search process of Apache Solr, and outputting the data after the search is finished.
2. A method of resolving apache solr phrase inaccuracies in search according to claim 1; the method is characterized in that: the plug-in for solving inaccurate search of Apache Solr phrases is that < queryParser > is configured in the solrconfig.
CN201710117467.8A 2017-03-01 2017-03-01 Method for solving inaccurate Apache Solr phrase search Active CN106933998B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710117467.8A CN106933998B (en) 2017-03-01 2017-03-01 Method for solving inaccurate Apache Solr phrase search

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710117467.8A CN106933998B (en) 2017-03-01 2017-03-01 Method for solving inaccurate Apache Solr phrase search

Publications (2)

Publication Number Publication Date
CN106933998A CN106933998A (en) 2017-07-07
CN106933998B true CN106933998B (en) 2021-03-02

Family

ID=59423888

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710117467.8A Active CN106933998B (en) 2017-03-01 2017-03-01 Method for solving inaccurate Apache Solr phrase search

Country Status (1)

Country Link
CN (1) CN106933998B (en)

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102682036A (en) * 2011-03-18 2012-09-19 新奥特(北京)视频技术有限公司 Non-editing based method and system for searching media assets
CN103488702A (en) * 2013-09-06 2014-01-01 云南电力试验研究院(集团)有限公司电力研究院 SorlCloud based unstructured data retrieval method and system

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102682036A (en) * 2011-03-18 2012-09-19 新奥特(北京)视频技术有限公司 Non-editing based method and system for searching media assets
CN103488702A (en) * 2013-09-06 2014-01-01 云南电力试验研究院(集团)有限公司电力研究院 SorlCloud based unstructured data retrieval method and system

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
Solr实现Low Level查询解析;作者不详;《程序园 https://www.cnblogs.com/lvfeilong/p/fsdf32erwrf.html》;20111103;第1-10页 *

Also Published As

Publication number Publication date
CN106933998A (en) 2017-07-07

Similar Documents

Publication Publication Date Title
TWI636452B (en) Method and system of voice recognition
CN109542929B (en) Voice query method and device and electronic equipment
CN102682763B (en) Method, device and terminal for correcting named entity vocabularies in voice input text
CN111177184A (en) Structured query language conversion method based on natural language and related equipment thereof
CN104657440B (en) Structured query statement generation system and method
CN102567384B (en) Webpage multi-language dynamic switching method and system based on webpage browser engine
CN111104423B (en) SQL statement generation method and device, electronic equipment and storage medium
US9110852B1 (en) Methods and systems for extracting information from text
CN110554875B (en) Code conversion method and device, electronic equipment and storage medium
CN111176650B (en) Parser generation method, search method, server, and storage medium
JP2017510924A5 (en)
CN104809106A (en) System and method for excavating patent schemes
CN109446221A (en) A kind of interactive data method for surveying based on semantic analysis
CN107038163A (en) A kind of text semantic modeling method towards magnanimity internet information
CN109558166A (en) A kind of code search method of facing defects positioning
CN111666372B (en) Method, device, electronic equipment and readable storage medium for analyzing query word query
CN113779062A (en) SQL statement generation method and device, storage medium and electronic equipment
CN108766507A (en) A kind of clinical quality index calculating method based on CQL Yu standard information model openEHR
JP2016099675A (en) Translation learning device, translation device, unique expression learning device, method, and program
JP2016164707A (en) Automatic translation device and translation model learning device
CN106933998B (en) Method for solving inaccurate Apache Solr phrase search
JP2013054607A (en) Rearrangement rule learning device, method and program, and translation device, method and program
KR101802051B1 (en) Method and system for constructing schema on natural language processing and knowledge database thereof
WO2016131295A1 (en) Northbound data conversion method and device
CN103064967A (en) Method and device used for establishing user binary relation bases

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant