CN107239554A - A kind of method that English text is retrieved based on matching degree - Google Patents

A kind of method that English text is retrieved based on matching degree Download PDF

Info

Publication number
CN107239554A
CN107239554A CN201710427632.XA CN201710427632A CN107239554A CN 107239554 A CN107239554 A CN 107239554A CN 201710427632 A CN201710427632 A CN 201710427632A CN 107239554 A CN107239554 A CN 107239554A
Authority
CN
China
Prior art keywords
retrieval
weight
matching degree
difference
default
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201710427632.XA
Other languages
Chinese (zh)
Other versions
CN107239554B (en
Inventor
刘曲
杨天地
马丽娣
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Jinzhou Medical University
Original Assignee
Jinzhou Medical University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Jinzhou Medical University filed Critical Jinzhou Medical University
Priority to CN201710427632.XA priority Critical patent/CN107239554B/en
Publication of CN107239554A publication Critical patent/CN107239554A/en
Application granted granted Critical
Publication of CN107239554B publication Critical patent/CN107239554B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/3332Query translation
    • G06F16/3334Selection or weighting of terms from queries, including natural language queries

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a kind of method that English text is retrieved based on matching degree, including:Step 1: prestoring retrieval information in server, every english literature associates a retrieval unit, include ID, english literature entry time and at least one retrieval bar for any one retrieval unit, the retrieval bar is to be made up of at least one noun and sincere verb in the summary of the english literature of the retrieval unit association, and carries out default weight to all retrieval bars;Step 2: input retrieval English, splits noun and sincere verb, and the noun and the sincere verb are expanded into retrieval sentence to the retrieval English;Obtain retrieving weight Step 3: carrying out the retrieval sentence similarity evaluation, and the retrieval weight is matched respectively with the default weight, be ranked up according to matching degree and obtain retrieval result list.

Description

A kind of method that English text is retrieved based on matching degree
Technical field
The present invention relates to English text retrieval, and in particular to a kind of method that English text is retrieved based on matching degree.
Background technology
For the retrieval of English text, current major way is to be carried out according to retrieval object with keyword set in advance Matching, it is determined whether matching, i.e., the form for English text to be retrieved being divided into different keywords is retrieved respectively, still Computer can not effectively disassemble the language mode of the mankind, therefore not be understood that query intention, so as to cause the information searched out not It is enough accurate.
Operated in view of the above-mentioned problems, user can add high-level syntax in search, but high-level syntax's input is complicated High is required to user so that user experience is reduced, and sentence to be retrieved and the Keywords matching degree that is previously set are inadequate.
The content of the invention
The present invention has designed and developed one of a kind of method that English text is retrieved based on matching degree, goal of the invention of the invention It is the retrieval result list for solving sentence to be retrieved.
The two of the goal of the invention of the present invention are the problem of improving sentence to be retrieved and preset matching degree.
The technical scheme that the present invention is provided is:
A kind of method that English text is retrieved based on matching degree, is comprised the following steps:
Step 1: prestoring retrieval information in server, every english literature associates a retrieval unit, for any One retrieval unit includes ID, english literature entry time and at least one retrieval bar, and the retrieval bar is by the retrieval At least one noun and sincere verb composition in the summary of the english literature of unit association, and all retrieval bars are carried out pre- If weight;
Step 2: input retrieval English, split noun and sincere verb to the retrieval English, and by the noun with The sincere verb is expanded into retrieval sentence;
Obtain retrieving weight Step 3: carrying out the retrieval sentence similarity evaluation, and weight and institute are retrieved by described State default weight to be matched respectively, be ranked up according to matching degree and obtain retrieval result list.
Preferably, in the step 2, the retrieval sentence is the logical groups of the noun and the sincere verb Close;Wherein, the logical combination includes:Or and, NOT logic relation.
Preferably, in the step 3, obtaining retrieval weight to the retrieval sentence progress similarity evaluation includes Following steps:
The field according to the noun searches the noun, and determine the keyword in the field;
By field density of the noun in the field, field depth, the relation with the keyword and with institute The relation intensity between keyword is stated, the word calculated between the keyword is weighed;
According to institute's predicate power, the retrieval distance between the keyword is calculated;
According to the retrieval distance, the similarity score of the retrieval sentence is calculated;
It regard the similarity score of the retrieval sentence as the retrieval weight.
Preferably, in the step 3, matched successively by the default weight size during matching.
Preferably, in the step 3, whether the corresponding information content of retrieval result list obtained after matching is big In predetermined quantity, if greater than predetermined quantity, then the retrieval result list of predetermined quantity is taken.
Preferably, the predetermined quantity is 25.
Preferably, in the step 3, the matching that the retrieval weight is matched respectively with the default weight Process is matched using fuzzy control method;
The difference Δ η and default power of weight η and default weight η ' difference Δ η, retrieval weight and default weight will be retrieved respectively Weight η ' ratioMatching degree φ is converted to the quantification gradation in fuzzy domain;
By the difference Δ η of the difference Δ η of the retrieval weight η and default weight η ', the retrieval weight and default weight and in advance If weight η ' ratioFuzzy control model is inputted, is 7 etc. by η points of the difference Δ of the retrieval weight η and default weight η ' Level, by the retrieval weight and the difference Δ η and default weight η ' of default weight ratioIt is divided into 7 grades, φ points of matching degree For 5 grades;
Fuzzy control model is output as matching degree φ;According to the matching degree φ, search and output is carried out.
Preferably, the difference Δ η of the retrieval weight η and default weight η ' domain be [- 10,10], retrieval weight with The difference Δ η and default weight η ' of default weight ratioDomain be [- 0.1,0.1], setting quantizing factor all be 1, matching The domain for spending φ is [0,1].
Preferably, difference Δ η points of the retrieval weight η and default weight η ' are 7 grades, fuzzy set for NB, NM, NS, 0, PS, PM, PB }, the difference Δ η and default weight η ' of retrieval weight and default weight ratioIt is divided into 7 grades, obscures Collect for { NB, NM, NS, 0, PS, PM, PB }, by matching degree, φ points are 5 grades, and fuzzy set is { 0, PS, PM, PB, PVB };It is subordinate to Function selects triangular membership.
Preferably, fuzzy control model controls the rule to be:
If weight difference Δ η is NM, weight difference ratioFor PM or PB, then matching degree φ is S;If weight difference Δ η For PB, weight difference ratioFor PM or PB, then matching degree φ is PVB.
The present invention is had the advantage that compared with prior art:
1st, the process setting that keyword is carried out matching degree calculating by the present invention is eliminated whereby, even on the noun of restriction Word and other interference caused without sincere word to retrieval result, reduce retrieval burden, improve recall precision;
2nd, the present invention text that computing is retrieved by way of fuzzy control and the matching degree for presetting text, are improved Matching efficiency and the accuracy for increasing result;
3rd, the present invention by presetting multiple retrieval bars, carrying out the calculating of matching degree, improving the complete of retrieval result respectively Face property.
Brief description of the drawings
Fig. 1 is flow chart of the present invention.
Fig. 2 is the membership function for the difference Δ η for retrieving weight η and default weight η '.
Fig. 3 is the difference Δ η and default weight η ' that retrieve weight and default weight ratioMembership function.
Fig. 4 is matching degree φ membership function.
Embodiment
The present invention is described in further detail below in conjunction with the accompanying drawings, to make those skilled in the art with reference to specification text Word can be implemented according to this.
As shown in figure 1, the present invention provides a kind of method that English text is retrieved based on matching degree, comprise the following steps:
Step 1: prestoring retrieval information in server, every english literature associates a retrieval unit, for any One retrieval unit includes ID, english literature entry time and at least one retrieval bar, and the retrieval bar is by the retrieval At least one noun and sincere verb composition in the summary of the english literature of unit association, and all retrieval bars are carried out pre- If weight;
Step 2: input retrieval English, split noun and sincere verb to the retrieval English, and by the noun with The sincere verb is expanded into retrieval sentence;
Obtain retrieving weight Step 3: carrying out the retrieval sentence similarity evaluation, and weight and institute are retrieved by described State default weight to be matched respectively, be ranked up according to matching degree and obtain retrieval result list.
In another embodiment, in step 2, retrieval sentence is the logical combination of noun and sincere verb;Wherein, Logical combination includes:Or and, NOT logic relation.
In another embodiment, in step 3, obtaining retrieval weight to retrieval sentence progress similarity evaluation includes Following steps:
Field according to where the noun searches noun, and determine the keyword in field;The noun is existed Field density, the relation of field depth and the keyword in the field and contacting by force between the keyword Degree, the word calculated between the keyword is weighed;According to institute's predicate power, the retrieval distance between the keyword is calculated;Root According to the retrieval distance, the similarity score of the retrieval sentence is calculated;It regard the similarity score of the retrieval sentence as institute State retrieval weight.
In another embodiment, in step 3, matched successively according to the size of default weight during matching, from pre- If weight greatly start matching, successively to it is last preset weight it is small, obtain multiple different retrieval result lists.
In another embodiment, in step 3, the corresponding information content of retrieval result list obtained after matching is It is no to be more than predetermined quantity, if greater than predetermined quantity, then take the retrieval result list of predetermined quantity;In the present embodiment, make a reservation for Quantity is 25.
Embodiment
Keyword c2 is determined in the field where noun, the Semantic Similarity between noun c1 and keyword c2 is defined For:
Wherein, DistC1, c2For the retrieval distance between noun c1 and keyword c2, the side of shortest path between the two is utilized Upper weights (word power) sum is calculated;Word power is directly related with the intensity linked between keyword, then sub- concept ci and his father's concept The intensity of c ' contacts, can be expressed as:
Preferably, it is contemplated that other factors, such as in art local density, concept depth and conceptual relation, Side right wt (ci, c ') between whole concept is expressed as:
Wherein, d (c ') represents depth of the c ' in the field where noun, the relation in fields of the E (c ') where noun Number,For the average relationship number in the field where noun, R (ci, c ') represents the conceptual relation factor, parameter alpha (α >=0) and β (0 The control field depth of≤β≤1) and density weigh the contribution calculated for whole word, and IC (c) is the deformation that calculating is linked between concept Form, i.e.,:
IC (c)=- logP (c),
Wherein, P (c) is the probability that concept c occurs in whole field.
In summary, the semantic distance between noun c1 and keyword c2 can be expressed as:
Wherein, path (c1, c2) be from noun c1 to keyword c2 by all concepts on path, LSuper (c1, C2 minimum father's concept between c1, c2) is represented;
Corresponding R (ci, c ') is distinguished according to identity relation, inheritance and relation on attributes and is defined as 1.0,0.6 and 0.3; What does not play in actual application Midst density E (c ') and depth d (c '), α and β sets 0 and 1, autgmentability language respectively In justice search, noun c1 is keyword c2 father's concept, and final semantic distance can be reduced to:
Obtain retrieving the similarity score of sentence by the semantic distance between noun c1 and keyword c2, and by the phase Retrieval weight is used as like degree scoring.
In another embodiment, the matching degree φ of retrieval weight and default weight, mould are calculated using fuzzy control method The input of paste Controlling model be retrieve weight η and default weight η ' weight difference Δ η and retrieve weight and default weight difference The poor ratio of Δ η and default weight η ' weightOutput is matching degree φ;The retrieval weight η and default weight η ' weight Poor Δ η excursion is [- 10,10], retrieves the difference Δ η and default weight η ' of weight and default the weight poor ratio of weightExcursion be [- 0.1,0.1], setting quantizing factor all be 1, therefore its domain be respectively [- 10,10] and [- 0.1, 0.1];Matching degree φ fuzzy domain is [0,1], in order to ensure the precision of control, makes it in each mode can be well It is controlled, according to repetition test, most the poor Δ η excursions of weight are divided into seven grades at last, and weight difference Δ η fuzzy set is { NB, NM, NS, ZO, PS, PM, PB }, NB represents negative big, and NM represents negative medium, and NS represents negative small, and ZO represents that zero, PS is represented just Small, PM represents just medium, and PB represents honest;Weight difference ratioExcursion is divided into seven grades, fuzzy set for NB, NM, NS, ZO, PS, PM, PB }, NB represents negative big, and NM represents negative medium, and NS represents negative small, and ZO represents that zero, PS represents just small, and PM is represented Just medium, PB represents honest;φ points of the matching degree of output is 5 grades, is respectively { ZO, PS, PM, PB, PVB }, ZO represents zero, PS represents small, and PM represents medium, and PB represents big, and PVB represents very big;Membership function select triangular membership, such as Fig. 2, 3rd, shown in 4.
The regular selection experience that controls of fuzzy control model is:
If weight difference Δ η is negative medium, weight difference ratioTo be just medium or honest, then matching degree φ is small;Such as Fruit weight difference Δ η is honest, weight difference ratioTo be just medium or honest, then matching degree φ is very big;Specific Fuzzy Control System rule is as shown in table 1.
The fuzzy control rule of table 1
Although embodiment of the present invention is disclosed as above, it is not restricted in specification and embodiment listed With it can be applied to various suitable the field of the invention completely, can be easily for those skilled in the art Other modification is realized, therefore under the universal limited without departing substantially from claim and equivalency range, the present invention is not limited In specific details and shown here as the legend with description.

Claims (10)

1. a kind of method that English text is retrieved based on matching degree, it is characterised in that comprise the following steps:
Step 1: prestoring retrieval information in server, every english literature associates a retrieval unit, for any one Retrieving unit includes ID, english literature entry time and at least one retrieval bar, and the retrieval bar is by the retrieval unit At least one noun and sincere verb composition in the summary of the english literature of association, and default power is carried out to all retrieval bars Weight;
Step 2: input retrieval English, split noun and sincere verb to the retrieval English, and by the noun with it is described Sincere verb is expanded into retrieval sentence;
Step 3: to it is described retrieval sentence carry out similarity evaluation obtain retrieve weight, and by it is described retrieval weight with it is described pre- If weight is matched respectively, it is ranked up according to matching degree and obtains retrieval result list.
2. the method as claimed in claim 1 that English text is retrieved based on matching degree, it is characterised in that in the step 2 In, the retrieval sentence is the logical combination of the noun and the sincere verb;Wherein, the logical combination includes:Or, And, NOT logic relation.
3. the method as claimed in claim 1 or 2 that English text is retrieved based on matching degree, it is characterised in that in the step In three, retrieval weight is obtained to the retrieval sentence progress similarity evaluation and comprised the following steps:
The field according to where the noun searches the noun, and determine the keyword in the field;
Close field density of the noun in the field, field depth, the relation with the keyword and with described Relation intensity between keyword, the word calculated between the keyword is weighed;
According to institute's predicate power, the retrieval distance between the keyword is calculated;
According to the retrieval distance, the similarity score of the retrieval sentence is calculated;
It regard the similarity score of the retrieval sentence as the retrieval weight.
4. the method as claimed in claim 3 that English text is retrieved based on matching degree, it is characterised in that in the step 3 In, matched successively by the default weight size during matching.
5. the method as claimed in claim 4 that English text is retrieved based on matching degree, it is characterised in that in the step 3 In, whether the corresponding information content of retrieval result list obtained after matching is more than predetermined quantity, if greater than predetermined quantity, then Take the retrieval result list of predetermined quantity.
6. the as claimed in claim 5 method that English text is retrieved based on matching degree, it is characterised in that the predetermined quantity is 25.
7. the method that English text is retrieved based on matching degree as any one of claim 1,2,4-6, it is characterised in that In the step 3, the retrieval weight uses fuzzy control side with the matching process that the default weight is matched respectively Method is matched;
The difference Δ η and default weight η ' of weight η and default weight η ' difference Δ η, retrieval weight and default weight will be retrieved respectively RatioMatching degree φ is converted to the quantification gradation in fuzzy domain;
By the difference Δ η of the difference Δ η of the retrieval weight η and default weight η ', the retrieval weight and default weight and default power Weight η ' ratioFuzzy control model is inputted, is 7 grades by η points of the difference Δ of the retrieval weight η and default weight η ', By the retrieval weight and the difference Δ η and default weight η ' of default weight ratioIt is divided into 7 grades, φ points of matching degree is 5 Individual grade;
Fuzzy control model is output as matching degree φ;According to the matching degree φ, search and output is carried out.
8. the method as claimed in claim 7 that English text is retrieved based on matching degree, it is characterised in that the retrieval weight η And default weight η ' difference Δ η domain is [- 10,10], retrieves the difference Δ η and default weight η ' of weight and default weight ratio ValueDomain be [- 0.1,0.1], setting quantizing factor is all 1, and matching degree φ domain is [0,1].
9. the method as claimed in claim 8 that English text is retrieved based on matching degree, it is characterised in that the retrieval weight η And η points of default weight η ' difference Δ is 7 grades, fuzzy set is { NB, NM, NS, 0, PS, PM, PB }, retrieval weight and default power The difference Δ η and default weight η ' of weight ratioIt is divided into 7 grades, fuzzy set is { NB, NM, NS, 0, PS, PM, PB }, general With degree, φ points are 5 grades, and fuzzy set is { 0, PS, PM, PB, PVB };Membership function selects triangular membership.
10. the method as claimed in claim 9 that English text is retrieved based on matching degree, it is characterised in that fuzzy control model The rule is controlled to be:
If weight difference Δ η is NM, weight difference ratioFor PM or PB, then matching degree φ is S;If weight difference Δ η is PB, weight difference ratioFor PM or PB, then matching degree φ is PVB.
CN201710427632.XA 2017-06-08 2017-06-08 Method for retrieving English text based on matching degree Active CN107239554B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710427632.XA CN107239554B (en) 2017-06-08 2017-06-08 Method for retrieving English text based on matching degree

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710427632.XA CN107239554B (en) 2017-06-08 2017-06-08 Method for retrieving English text based on matching degree

Publications (2)

Publication Number Publication Date
CN107239554A true CN107239554A (en) 2017-10-10
CN107239554B CN107239554B (en) 2020-02-11

Family

ID=59987476

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710427632.XA Active CN107239554B (en) 2017-06-08 2017-06-08 Method for retrieving English text based on matching degree

Country Status (1)

Country Link
CN (1) CN107239554B (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108345694A (en) * 2018-03-19 2018-07-31 华北电力大学(保定) A kind of document retrieval method and system based on subject data base
CN111046140A (en) * 2019-11-25 2020-04-21 华中科技大学同济医学院附属协和医院 Automatic office service communication robot and control method thereof
CN111104488A (en) * 2019-12-30 2020-05-05 广州广电运通信息科技有限公司 Method, device and storage medium for integrating retrieval and similarity analysis
CN114511027A (en) * 2022-01-29 2022-05-17 重庆工业职业技术学院 Method for extracting English remote data through big data network

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103761341A (en) * 2014-02-21 2014-04-30 北京嘉和美康信息技术有限公司 Information matching method and device
CN103902724A (en) * 2014-04-10 2014-07-02 辽宁医学院 English literature search method

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103761341A (en) * 2014-02-21 2014-04-30 北京嘉和美康信息技术有限公司 Information matching method and device
CN103902724A (en) * 2014-04-10 2014-07-02 辽宁医学院 English literature search method

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
王旭阳等: "信息检索中语义相似度算法研究", 《计算机工程与应用》 *

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108345694A (en) * 2018-03-19 2018-07-31 华北电力大学(保定) A kind of document retrieval method and system based on subject data base
CN108345694B (en) * 2018-03-19 2021-09-03 华北电力大学(保定) Document retrieval method and system based on theme database
CN111046140A (en) * 2019-11-25 2020-04-21 华中科技大学同济医学院附属协和医院 Automatic office service communication robot and control method thereof
CN111104488A (en) * 2019-12-30 2020-05-05 广州广电运通信息科技有限公司 Method, device and storage medium for integrating retrieval and similarity analysis
CN111104488B (en) * 2019-12-30 2023-10-24 广州广电运通信息科技有限公司 Method, device and storage medium for integrating retrieval and similarity analysis
CN114511027A (en) * 2022-01-29 2022-05-17 重庆工业职业技术学院 Method for extracting English remote data through big data network
CN114511027B (en) * 2022-01-29 2022-11-11 重庆工业职业技术学院 Method for extracting English remote data through big data network

Also Published As

Publication number Publication date
CN107239554B (en) 2020-02-11

Similar Documents

Publication Publication Date Title
CN110442777B (en) BERT-based pseudo-correlation feedback model information retrieval method and system
CN109800284B (en) Task-oriented unstructured information intelligent question-answering system construction method
US8341159B2 (en) Creating taxonomies and training data for document categorization
JP3781696B2 (en) Image search method and search device
CN103927358A (en) Text search method and system
CN107239554A (en) A kind of method that English text is retrieved based on matching degree
US20030212663A1 (en) Neural network feedback for enhancing text search
CN105975596A (en) Query expansion method and system of search engine
CN107665217A (en) A kind of vocabulary processing method and system for searching service
CN116134432A (en) System and method for providing answers to queries
CN104778276A (en) Multi-index combining and sequencing algorithm based on improved TF-IDF (term frequency-inverse document frequency)
US20170185672A1 (en) Rank aggregation based on a markov model
CN107943919A (en) A kind of enquiry expanding method of session-oriented formula entity search
CN108520038B (en) Biomedical literature retrieval method based on sequencing learning algorithm
Durga et al. Ontology based text categorization-telugu document
CN112182155B (en) Search result diversification method based on generated type countermeasure network
Wang et al. Reproducibility, Replicability, and Insights into Dense Multi-Representation Retrieval Models: from ColBERT to Col
US20210406291A1 (en) Dialog driven search system and method
Lee et al. A query-dependent ranking approach for search engines
JP5432936B2 (en) Document search apparatus having ranking model selection function, document search method having ranking model selection function, and document search program having ranking model selection function
JP2004054882A (en) Synonym retrieval device, method, program and storage medium
CN105930358A (en) Case searching method and system based on correlation degree
Kumar et al. Smart information retrieval using query transformation based on ontology and semantic-association
CN107818078B (en) Semantic association and matching method for Chinese natural language dialogue
Granados et al. Multimodal Information Approaches for the Wikipedia Collection at ImageCLEF 2011.

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant