CN107239554A - A kind of method that English text is retrieved based on matching degree - Google Patents
A kind of method that English text is retrieved based on matching degree Download PDFInfo
- Publication number
- CN107239554A CN107239554A CN201710427632.XA CN201710427632A CN107239554A CN 107239554 A CN107239554 A CN 107239554A CN 201710427632 A CN201710427632 A CN 201710427632A CN 107239554 A CN107239554 A CN 107239554A
- Authority
- CN
- China
- Prior art keywords
- retrieval
- weight
- matching degree
- difference
- default
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/33—Querying
- G06F16/3331—Query processing
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/33—Querying
- G06F16/3331—Query processing
- G06F16/3332—Query translation
- G06F16/3334—Selection or weighting of terms from queries, including natural language queries
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Computational Linguistics (AREA)
- Data Mining & Analysis (AREA)
- Databases & Information Systems (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Artificial Intelligence (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention discloses a kind of method that English text is retrieved based on matching degree, including:Step 1: prestoring retrieval information in server, every english literature associates a retrieval unit, include ID, english literature entry time and at least one retrieval bar for any one retrieval unit, the retrieval bar is to be made up of at least one noun and sincere verb in the summary of the english literature of the retrieval unit association, and carries out default weight to all retrieval bars;Step 2: input retrieval English, splits noun and sincere verb, and the noun and the sincere verb are expanded into retrieval sentence to the retrieval English;Obtain retrieving weight Step 3: carrying out the retrieval sentence similarity evaluation, and the retrieval weight is matched respectively with the default weight, be ranked up according to matching degree and obtain retrieval result list.
Description
Technical field
The present invention relates to English text retrieval, and in particular to a kind of method that English text is retrieved based on matching degree.
Background technology
For the retrieval of English text, current major way is to be carried out according to retrieval object with keyword set in advance
Matching, it is determined whether matching, i.e., the form for English text to be retrieved being divided into different keywords is retrieved respectively, still
Computer can not effectively disassemble the language mode of the mankind, therefore not be understood that query intention, so as to cause the information searched out not
It is enough accurate.
Operated in view of the above-mentioned problems, user can add high-level syntax in search, but high-level syntax's input is complicated
High is required to user so that user experience is reduced, and sentence to be retrieved and the Keywords matching degree that is previously set are inadequate.
The content of the invention
The present invention has designed and developed one of a kind of method that English text is retrieved based on matching degree, goal of the invention of the invention
It is the retrieval result list for solving sentence to be retrieved.
The two of the goal of the invention of the present invention are the problem of improving sentence to be retrieved and preset matching degree.
The technical scheme that the present invention is provided is:
A kind of method that English text is retrieved based on matching degree, is comprised the following steps:
Step 1: prestoring retrieval information in server, every english literature associates a retrieval unit, for any
One retrieval unit includes ID, english literature entry time and at least one retrieval bar, and the retrieval bar is by the retrieval
At least one noun and sincere verb composition in the summary of the english literature of unit association, and all retrieval bars are carried out pre-
If weight;
Step 2: input retrieval English, split noun and sincere verb to the retrieval English, and by the noun with
The sincere verb is expanded into retrieval sentence;
Obtain retrieving weight Step 3: carrying out the retrieval sentence similarity evaluation, and weight and institute are retrieved by described
State default weight to be matched respectively, be ranked up according to matching degree and obtain retrieval result list.
Preferably, in the step 2, the retrieval sentence is the logical groups of the noun and the sincere verb
Close;Wherein, the logical combination includes:Or and, NOT logic relation.
Preferably, in the step 3, obtaining retrieval weight to the retrieval sentence progress similarity evaluation includes
Following steps:
The field according to the noun searches the noun, and determine the keyword in the field;
By field density of the noun in the field, field depth, the relation with the keyword and with institute
The relation intensity between keyword is stated, the word calculated between the keyword is weighed;
According to institute's predicate power, the retrieval distance between the keyword is calculated;
According to the retrieval distance, the similarity score of the retrieval sentence is calculated;
It regard the similarity score of the retrieval sentence as the retrieval weight.
Preferably, in the step 3, matched successively by the default weight size during matching.
Preferably, in the step 3, whether the corresponding information content of retrieval result list obtained after matching is big
In predetermined quantity, if greater than predetermined quantity, then the retrieval result list of predetermined quantity is taken.
Preferably, the predetermined quantity is 25.
Preferably, in the step 3, the matching that the retrieval weight is matched respectively with the default weight
Process is matched using fuzzy control method;
The difference Δ η and default power of weight η and default weight η ' difference Δ η, retrieval weight and default weight will be retrieved respectively
Weight η ' ratioMatching degree φ is converted to the quantification gradation in fuzzy domain;
By the difference Δ η of the difference Δ η of the retrieval weight η and default weight η ', the retrieval weight and default weight and in advance
If weight η ' ratioFuzzy control model is inputted, is 7 etc. by η points of the difference Δ of the retrieval weight η and default weight η '
Level, by the retrieval weight and the difference Δ η and default weight η ' of default weight ratioIt is divided into 7 grades, φ points of matching degree
For 5 grades;
Fuzzy control model is output as matching degree φ;According to the matching degree φ, search and output is carried out.
Preferably, the difference Δ η of the retrieval weight η and default weight η ' domain be [- 10,10], retrieval weight with
The difference Δ η and default weight η ' of default weight ratioDomain be [- 0.1,0.1], setting quantizing factor all be 1, matching
The domain for spending φ is [0,1].
Preferably, difference Δ η points of the retrieval weight η and default weight η ' are 7 grades, fuzzy set for NB, NM,
NS, 0, PS, PM, PB }, the difference Δ η and default weight η ' of retrieval weight and default weight ratioIt is divided into 7 grades, obscures
Collect for { NB, NM, NS, 0, PS, PM, PB }, by matching degree, φ points are 5 grades, and fuzzy set is { 0, PS, PM, PB, PVB };It is subordinate to
Function selects triangular membership.
Preferably, fuzzy control model controls the rule to be:
If weight difference Δ η is NM, weight difference ratioFor PM or PB, then matching degree φ is S;If weight difference Δ η
For PB, weight difference ratioFor PM or PB, then matching degree φ is PVB.
The present invention is had the advantage that compared with prior art:
1st, the process setting that keyword is carried out matching degree calculating by the present invention is eliminated whereby, even on the noun of restriction
Word and other interference caused without sincere word to retrieval result, reduce retrieval burden, improve recall precision;
2nd, the present invention text that computing is retrieved by way of fuzzy control and the matching degree for presetting text, are improved
Matching efficiency and the accuracy for increasing result;
3rd, the present invention by presetting multiple retrieval bars, carrying out the calculating of matching degree, improving the complete of retrieval result respectively
Face property.
Brief description of the drawings
Fig. 1 is flow chart of the present invention.
Fig. 2 is the membership function for the difference Δ η for retrieving weight η and default weight η '.
Fig. 3 is the difference Δ η and default weight η ' that retrieve weight and default weight ratioMembership function.
Fig. 4 is matching degree φ membership function.
Embodiment
The present invention is described in further detail below in conjunction with the accompanying drawings, to make those skilled in the art with reference to specification text
Word can be implemented according to this.
As shown in figure 1, the present invention provides a kind of method that English text is retrieved based on matching degree, comprise the following steps:
Step 1: prestoring retrieval information in server, every english literature associates a retrieval unit, for any
One retrieval unit includes ID, english literature entry time and at least one retrieval bar, and the retrieval bar is by the retrieval
At least one noun and sincere verb composition in the summary of the english literature of unit association, and all retrieval bars are carried out pre-
If weight;
Step 2: input retrieval English, split noun and sincere verb to the retrieval English, and by the noun with
The sincere verb is expanded into retrieval sentence;
Obtain retrieving weight Step 3: carrying out the retrieval sentence similarity evaluation, and weight and institute are retrieved by described
State default weight to be matched respectively, be ranked up according to matching degree and obtain retrieval result list.
In another embodiment, in step 2, retrieval sentence is the logical combination of noun and sincere verb;Wherein,
Logical combination includes:Or and, NOT logic relation.
In another embodiment, in step 3, obtaining retrieval weight to retrieval sentence progress similarity evaluation includes
Following steps:
Field according to where the noun searches noun, and determine the keyword in field;The noun is existed
Field density, the relation of field depth and the keyword in the field and contacting by force between the keyword
Degree, the word calculated between the keyword is weighed;According to institute's predicate power, the retrieval distance between the keyword is calculated;Root
According to the retrieval distance, the similarity score of the retrieval sentence is calculated;It regard the similarity score of the retrieval sentence as institute
State retrieval weight.
In another embodiment, in step 3, matched successively according to the size of default weight during matching, from pre-
If weight greatly start matching, successively to it is last preset weight it is small, obtain multiple different retrieval result lists.
In another embodiment, in step 3, the corresponding information content of retrieval result list obtained after matching is
It is no to be more than predetermined quantity, if greater than predetermined quantity, then take the retrieval result list of predetermined quantity;In the present embodiment, make a reservation for
Quantity is 25.
Embodiment
Keyword c2 is determined in the field where noun, the Semantic Similarity between noun c1 and keyword c2 is defined
For:
Wherein, DistC1, c2For the retrieval distance between noun c1 and keyword c2, the side of shortest path between the two is utilized
Upper weights (word power) sum is calculated;Word power is directly related with the intensity linked between keyword, then sub- concept ci and his father's concept
The intensity of c ' contacts, can be expressed as:
Preferably, it is contemplated that other factors, such as in art local density, concept depth and conceptual relation,
Side right wt (ci, c ') between whole concept is expressed as:
Wherein, d (c ') represents depth of the c ' in the field where noun, the relation in fields of the E (c ') where noun
Number,For the average relationship number in the field where noun, R (ci, c ') represents the conceptual relation factor, parameter alpha (α >=0) and β (0
The control field depth of≤β≤1) and density weigh the contribution calculated for whole word, and IC (c) is the deformation that calculating is linked between concept
Form, i.e.,:
IC (c)=- logP (c),
Wherein, P (c) is the probability that concept c occurs in whole field.
In summary, the semantic distance between noun c1 and keyword c2 can be expressed as:
Wherein, path (c1, c2) be from noun c1 to keyword c2 by all concepts on path, LSuper (c1,
C2 minimum father's concept between c1, c2) is represented;
Corresponding R (ci, c ') is distinguished according to identity relation, inheritance and relation on attributes and is defined as 1.0,0.6 and 0.3;
What does not play in actual application Midst density E (c ') and depth d (c '), α and β sets 0 and 1, autgmentability language respectively
In justice search, noun c1 is keyword c2 father's concept, and final semantic distance can be reduced to:
Obtain retrieving the similarity score of sentence by the semantic distance between noun c1 and keyword c2, and by the phase
Retrieval weight is used as like degree scoring.
In another embodiment, the matching degree φ of retrieval weight and default weight, mould are calculated using fuzzy control method
The input of paste Controlling model be retrieve weight η and default weight η ' weight difference Δ η and retrieve weight and default weight difference
The poor ratio of Δ η and default weight η ' weightOutput is matching degree φ;The retrieval weight η and default weight η ' weight
Poor Δ η excursion is [- 10,10], retrieves the difference Δ η and default weight η ' of weight and default the weight poor ratio of weightExcursion be [- 0.1,0.1], setting quantizing factor all be 1, therefore its domain be respectively [- 10,10] and [- 0.1,
0.1];Matching degree φ fuzzy domain is [0,1], in order to ensure the precision of control, makes it in each mode can be well
It is controlled, according to repetition test, most the poor Δ η excursions of weight are divided into seven grades at last, and weight difference Δ η fuzzy set is
{ NB, NM, NS, ZO, PS, PM, PB }, NB represents negative big, and NM represents negative medium, and NS represents negative small, and ZO represents that zero, PS is represented just
Small, PM represents just medium, and PB represents honest;Weight difference ratioExcursion is divided into seven grades, fuzzy set for NB, NM,
NS, ZO, PS, PM, PB }, NB represents negative big, and NM represents negative medium, and NS represents negative small, and ZO represents that zero, PS represents just small, and PM is represented
Just medium, PB represents honest;φ points of the matching degree of output is 5 grades, is respectively { ZO, PS, PM, PB, PVB }, ZO represents zero,
PS represents small, and PM represents medium, and PB represents big, and PVB represents very big;Membership function select triangular membership, such as Fig. 2,
3rd, shown in 4.
The regular selection experience that controls of fuzzy control model is:
If weight difference Δ η is negative medium, weight difference ratioTo be just medium or honest, then matching degree φ is small;Such as
Fruit weight difference Δ η is honest, weight difference ratioTo be just medium or honest, then matching degree φ is very big;Specific Fuzzy Control
System rule is as shown in table 1.
The fuzzy control rule of table 1
Although embodiment of the present invention is disclosed as above, it is not restricted in specification and embodiment listed
With it can be applied to various suitable the field of the invention completely, can be easily for those skilled in the art
Other modification is realized, therefore under the universal limited without departing substantially from claim and equivalency range, the present invention is not limited
In specific details and shown here as the legend with description.
Claims (10)
1. a kind of method that English text is retrieved based on matching degree, it is characterised in that comprise the following steps:
Step 1: prestoring retrieval information in server, every english literature associates a retrieval unit, for any one
Retrieving unit includes ID, english literature entry time and at least one retrieval bar, and the retrieval bar is by the retrieval unit
At least one noun and sincere verb composition in the summary of the english literature of association, and default power is carried out to all retrieval bars
Weight;
Step 2: input retrieval English, split noun and sincere verb to the retrieval English, and by the noun with it is described
Sincere verb is expanded into retrieval sentence;
Step 3: to it is described retrieval sentence carry out similarity evaluation obtain retrieve weight, and by it is described retrieval weight with it is described pre-
If weight is matched respectively, it is ranked up according to matching degree and obtains retrieval result list.
2. the method as claimed in claim 1 that English text is retrieved based on matching degree, it is characterised in that in the step 2
In, the retrieval sentence is the logical combination of the noun and the sincere verb;Wherein, the logical combination includes:Or,
And, NOT logic relation.
3. the method as claimed in claim 1 or 2 that English text is retrieved based on matching degree, it is characterised in that in the step
In three, retrieval weight is obtained to the retrieval sentence progress similarity evaluation and comprised the following steps:
The field according to where the noun searches the noun, and determine the keyword in the field;
Close field density of the noun in the field, field depth, the relation with the keyword and with described
Relation intensity between keyword, the word calculated between the keyword is weighed;
According to institute's predicate power, the retrieval distance between the keyword is calculated;
According to the retrieval distance, the similarity score of the retrieval sentence is calculated;
It regard the similarity score of the retrieval sentence as the retrieval weight.
4. the method as claimed in claim 3 that English text is retrieved based on matching degree, it is characterised in that in the step 3
In, matched successively by the default weight size during matching.
5. the method as claimed in claim 4 that English text is retrieved based on matching degree, it is characterised in that in the step 3
In, whether the corresponding information content of retrieval result list obtained after matching is more than predetermined quantity, if greater than predetermined quantity, then
Take the retrieval result list of predetermined quantity.
6. the as claimed in claim 5 method that English text is retrieved based on matching degree, it is characterised in that the predetermined quantity is
25.
7. the method that English text is retrieved based on matching degree as any one of claim 1,2,4-6, it is characterised in that
In the step 3, the retrieval weight uses fuzzy control side with the matching process that the default weight is matched respectively
Method is matched;
The difference Δ η and default weight η ' of weight η and default weight η ' difference Δ η, retrieval weight and default weight will be retrieved respectively
RatioMatching degree φ is converted to the quantification gradation in fuzzy domain;
By the difference Δ η of the difference Δ η of the retrieval weight η and default weight η ', the retrieval weight and default weight and default power
Weight η ' ratioFuzzy control model is inputted, is 7 grades by η points of the difference Δ of the retrieval weight η and default weight η ',
By the retrieval weight and the difference Δ η and default weight η ' of default weight ratioIt is divided into 7 grades, φ points of matching degree is 5
Individual grade;
Fuzzy control model is output as matching degree φ;According to the matching degree φ, search and output is carried out.
8. the method as claimed in claim 7 that English text is retrieved based on matching degree, it is characterised in that the retrieval weight η
And default weight η ' difference Δ η domain is [- 10,10], retrieves the difference Δ η and default weight η ' of weight and default weight ratio
ValueDomain be [- 0.1,0.1], setting quantizing factor is all 1, and matching degree φ domain is [0,1].
9. the method as claimed in claim 8 that English text is retrieved based on matching degree, it is characterised in that the retrieval weight η
And η points of default weight η ' difference Δ is 7 grades, fuzzy set is { NB, NM, NS, 0, PS, PM, PB }, retrieval weight and default power
The difference Δ η and default weight η ' of weight ratioIt is divided into 7 grades, fuzzy set is { NB, NM, NS, 0, PS, PM, PB }, general
With degree, φ points are 5 grades, and fuzzy set is { 0, PS, PM, PB, PVB };Membership function selects triangular membership.
10. the method as claimed in claim 9 that English text is retrieved based on matching degree, it is characterised in that fuzzy control model
The rule is controlled to be:
If weight difference Δ η is NM, weight difference ratioFor PM or PB, then matching degree φ is S;If weight difference Δ η is
PB, weight difference ratioFor PM or PB, then matching degree φ is PVB.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710427632.XA CN107239554B (en) | 2017-06-08 | 2017-06-08 | Method for retrieving English text based on matching degree |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710427632.XA CN107239554B (en) | 2017-06-08 | 2017-06-08 | Method for retrieving English text based on matching degree |
Publications (2)
Publication Number | Publication Date |
---|---|
CN107239554A true CN107239554A (en) | 2017-10-10 |
CN107239554B CN107239554B (en) | 2020-02-11 |
Family
ID=59987476
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201710427632.XA Active CN107239554B (en) | 2017-06-08 | 2017-06-08 | Method for retrieving English text based on matching degree |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN107239554B (en) |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108345694A (en) * | 2018-03-19 | 2018-07-31 | 华北电力大学(保定) | A kind of document retrieval method and system based on subject data base |
CN111046140A (en) * | 2019-11-25 | 2020-04-21 | 华中科技大学同济医学院附属协和医院 | Automatic office service communication robot and control method thereof |
CN111104488A (en) * | 2019-12-30 | 2020-05-05 | 广州广电运通信息科技有限公司 | Method, device and storage medium for integrating retrieval and similarity analysis |
CN114511027A (en) * | 2022-01-29 | 2022-05-17 | 重庆工业职业技术学院 | Method for extracting English remote data through big data network |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103761341A (en) * | 2014-02-21 | 2014-04-30 | 北京嘉和美康信息技术有限公司 | Information matching method and device |
CN103902724A (en) * | 2014-04-10 | 2014-07-02 | 辽宁医学院 | English literature search method |
-
2017
- 2017-06-08 CN CN201710427632.XA patent/CN107239554B/en active Active
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103761341A (en) * | 2014-02-21 | 2014-04-30 | 北京嘉和美康信息技术有限公司 | Information matching method and device |
CN103902724A (en) * | 2014-04-10 | 2014-07-02 | 辽宁医学院 | English literature search method |
Non-Patent Citations (1)
Title |
---|
王旭阳等: "信息检索中语义相似度算法研究", 《计算机工程与应用》 * |
Cited By (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108345694A (en) * | 2018-03-19 | 2018-07-31 | 华北电力大学(保定) | A kind of document retrieval method and system based on subject data base |
CN108345694B (en) * | 2018-03-19 | 2021-09-03 | 华北电力大学(保定) | Document retrieval method and system based on theme database |
CN111046140A (en) * | 2019-11-25 | 2020-04-21 | 华中科技大学同济医学院附属协和医院 | Automatic office service communication robot and control method thereof |
CN111104488A (en) * | 2019-12-30 | 2020-05-05 | 广州广电运通信息科技有限公司 | Method, device and storage medium for integrating retrieval and similarity analysis |
CN111104488B (en) * | 2019-12-30 | 2023-10-24 | 广州广电运通信息科技有限公司 | Method, device and storage medium for integrating retrieval and similarity analysis |
CN114511027A (en) * | 2022-01-29 | 2022-05-17 | 重庆工业职业技术学院 | Method for extracting English remote data through big data network |
CN114511027B (en) * | 2022-01-29 | 2022-11-11 | 重庆工业职业技术学院 | Method for extracting English remote data through big data network |
Also Published As
Publication number | Publication date |
---|---|
CN107239554B (en) | 2020-02-11 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110442777B (en) | BERT-based pseudo-correlation feedback model information retrieval method and system | |
CN109800284B (en) | Task-oriented unstructured information intelligent question-answering system construction method | |
US8341159B2 (en) | Creating taxonomies and training data for document categorization | |
JP3781696B2 (en) | Image search method and search device | |
CN103927358A (en) | Text search method and system | |
CN107239554A (en) | A kind of method that English text is retrieved based on matching degree | |
US20030212663A1 (en) | Neural network feedback for enhancing text search | |
CN105975596A (en) | Query expansion method and system of search engine | |
CN107665217A (en) | A kind of vocabulary processing method and system for searching service | |
CN116134432A (en) | System and method for providing answers to queries | |
CN104778276A (en) | Multi-index combining and sequencing algorithm based on improved TF-IDF (term frequency-inverse document frequency) | |
US20170185672A1 (en) | Rank aggregation based on a markov model | |
CN107943919A (en) | A kind of enquiry expanding method of session-oriented formula entity search | |
CN108520038B (en) | Biomedical literature retrieval method based on sequencing learning algorithm | |
Durga et al. | Ontology based text categorization-telugu document | |
CN112182155B (en) | Search result diversification method based on generated type countermeasure network | |
Wang et al. | Reproducibility, Replicability, and Insights into Dense Multi-Representation Retrieval Models: from ColBERT to Col | |
US20210406291A1 (en) | Dialog driven search system and method | |
Lee et al. | A query-dependent ranking approach for search engines | |
JP5432936B2 (en) | Document search apparatus having ranking model selection function, document search method having ranking model selection function, and document search program having ranking model selection function | |
JP2004054882A (en) | Synonym retrieval device, method, program and storage medium | |
CN105930358A (en) | Case searching method and system based on correlation degree | |
Kumar et al. | Smart information retrieval using query transformation based on ontology and semantic-association | |
CN107818078B (en) | Semantic association and matching method for Chinese natural language dialogue | |
Granados et al. | Multimodal Information Approaches for the Wikipedia Collection at ImageCLEF 2011. |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |