CN107239554A

CN107239554A - A kind of method that English text is retrieved based on matching degree

Info

Publication number: CN107239554A
Application number: CN201710427632.XA
Authority: CN
Inventors: 刘曲; 杨天地; 马丽娣
Original assignee: Jinzhou Medical University
Current assignee: Jinzhou Medical University
Priority date: 2017-06-08
Filing date: 2017-06-08
Publication date: 2017-10-10
Anticipated expiration: 2037-06-08
Also published as: CN107239554B

Abstract

The invention discloses a kind of method that English text is retrieved based on matching degree, including：Step 1: prestoring retrieval information in server, every english literature associates a retrieval unit, include ID, english literature entry time and at least one retrieval bar for any one retrieval unit, the retrieval bar is to be made up of at least one noun and sincere verb in the summary of the english literature of the retrieval unit association, and carries out default weight to all retrieval bars；Step 2: input retrieval English, splits noun and sincere verb, and the noun and the sincere verb are expanded into retrieval sentence to the retrieval English；Obtain retrieving weight Step 3: carrying out the retrieval sentence similarity evaluation, and the retrieval weight is matched respectively with the default weight, be ranked up according to matching degree and obtain retrieval result list.

Description

A kind of method that English text is retrieved based on matching degree

Technical field

The present invention relates to English text retrieval, and in particular to a kind of method that English text is retrieved based on matching degree.

Background technology

For the retrieval of English text, current major way is to be carried out according to retrieval object with keyword set in advance Matching, it is determined whether matching, i.e., the form for English text to be retrieved being divided into different keywords is retrieved respectively, still Computer can not effectively disassemble the language mode of the mankind, therefore not be understood that query intention, so as to cause the information searched out not It is enough accurate.

Operated in view of the above-mentioned problems, user can add high-level syntax in search, but high-level syntax's input is complicated High is required to user so that user experience is reduced, and sentence to be retrieved and the Keywords matching degree that is previously set are inadequate.

The content of the invention

The present invention has designed and developed one of a kind of method that English text is retrieved based on matching degree, goal of the invention of the invention It is the retrieval result list for solving sentence to be retrieved.

The two of the goal of the invention of the present invention are the problem of improving sentence to be retrieved and preset matching degree.

The technical scheme that the present invention is provided is：

A kind of method that English text is retrieved based on matching degree, is comprised the following steps：

Step 1: prestoring retrieval information in server, every english literature associates a retrieval unit, for any One retrieval unit includes ID, english literature entry time and at least one retrieval bar, and the retrieval bar is by the retrieval At least one noun and sincere verb composition in the summary of the english literature of unit association, and all retrieval bars are carried out pre- If weight；

Step 2: input retrieval English, split noun and sincere verb to the retrieval English, and by the noun with The sincere verb is expanded into retrieval sentence；

Obtain retrieving weight Step 3: carrying out the retrieval sentence similarity evaluation, and weight and institute are retrieved by described State default weight to be matched respectively, be ranked up according to matching degree and obtain retrieval result list.

Preferably, in the step 2, the retrieval sentence is the logical groups of the noun and the sincere verb Close；Wherein, the logical combination includes：Or and, NOT logic relation.

Preferably, in the step 3, obtaining retrieval weight to the retrieval sentence progress similarity evaluation includes Following steps：

The field according to the noun searches the noun, and determine the keyword in the field；

By field density of the noun in the field, field depth, the relation with the keyword and with institute The relation intensity between keyword is stated, the word calculated between the keyword is weighed；

According to institute's predicate power, the retrieval distance between the keyword is calculated；

According to the retrieval distance, the similarity score of the retrieval sentence is calculated；

It regard the similarity score of the retrieval sentence as the retrieval weight.

Preferably, in the step 3, matched successively by the default weight size during matching.

Preferably, in the step 3, whether the corresponding information content of retrieval result list obtained after matching is big In predetermined quantity, if greater than predetermined quantity, then the retrieval result list of predetermined quantity is taken.

Preferably, the predetermined quantity is 25.

Preferably, in the step 3, the matching that the retrieval weight is matched respectively with the default weight Process is matched using fuzzy control method；

The difference Δ η and default power of weight η and default weight η ' difference Δ η, retrieval weight and default weight will be retrieved respectively Weight η ' ratioMatching degree φ is converted to the quantification gradation in fuzzy domain；

By the difference Δ η of the difference Δ η of the retrieval weight η and default weight η ', the retrieval weight and default weight and in advance If weight η ' ratioFuzzy control model is inputted, is 7 etc. by η points of the difference Δ of the retrieval weight η and default weight η ' Level, by the retrieval weight and the difference Δ η and default weight η ' of default weight ratioIt is divided into 7 grades, φ points of matching degree For 5 grades；

Fuzzy control model is output as matching degree φ；According to the matching degree φ, search and output is carried out.

Preferably, the difference Δ η of the retrieval weight η and default weight η ' domain be [- 10,10], retrieval weight with The difference Δ η and default weight η ' of default weight ratioDomain be [- 0.1,0.1], setting quantizing factor all be 1, matching The domain for spending φ is [0,1].

Preferably, difference Δ η points of the retrieval weight η and default weight η ' are 7 grades, fuzzy set for NB, NM, NS, 0, PS, PM, PB }, the difference Δ η and default weight η ' of retrieval weight and default weight ratioIt is divided into 7 grades, obscures Collect for { NB, NM, NS, 0, PS, PM, PB }, by matching degree, φ points are 5 grades, and fuzzy set is { 0, PS, PM, PB, PVB }；It is subordinate to Function selects triangular membership.

Preferably, fuzzy control model controls the rule to be：

If weight difference Δ η is NM, weight difference ratioFor PM or PB, then matching degree φ is S；If weight difference Δ η For PB, weight difference ratioFor PM or PB, then matching degree φ is PVB.

The present invention is had the advantage that compared with prior art：

1st, the process setting that keyword is carried out matching degree calculating by the present invention is eliminated whereby, even on the noun of restriction Word and other interference caused without sincere word to retrieval result, reduce retrieval burden, improve recall precision；

2nd, the present invention text that computing is retrieved by way of fuzzy control and the matching degree for presetting text, are improved Matching efficiency and the accuracy for increasing result；

3rd, the present invention by presetting multiple retrieval bars, carrying out the calculating of matching degree, improving the complete of retrieval result respectively Face property.

Brief description of the drawings

Fig. 1 is flow chart of the present invention.

Fig. 2 is the membership function for the difference Δ η for retrieving weight η and default weight η '.

Fig. 3 is the difference Δ η and default weight η ' that retrieve weight and default weight ratioMembership function.

Fig. 4 is matching degree φ membership function.

Embodiment

The present invention is described in further detail below in conjunction with the accompanying drawings, to make those skilled in the art with reference to specification text Word can be implemented according to this.

As shown in figure 1, the present invention provides a kind of method that English text is retrieved based on matching degree, comprise the following steps：

In another embodiment, in step 2, retrieval sentence is the logical combination of noun and sincere verb；Wherein, Logical combination includes：Or and, NOT logic relation.

In another embodiment, in step 3, obtaining retrieval weight to retrieval sentence progress similarity evaluation includes Following steps：

Field according to where the noun searches noun, and determine the keyword in field；The noun is existed Field density, the relation of field depth and the keyword in the field and contacting by force between the keyword Degree, the word calculated between the keyword is weighed；According to institute's predicate power, the retrieval distance between the keyword is calculated；Root According to the retrieval distance, the similarity score of the retrieval sentence is calculated；It regard the similarity score of the retrieval sentence as institute State retrieval weight.

In another embodiment, in step 3, matched successively according to the size of default weight during matching, from pre- If weight greatly start matching, successively to it is last preset weight it is small, obtain multiple different retrieval result lists.

In another embodiment, in step 3, the corresponding information content of retrieval result list obtained after matching is It is no to be more than predetermined quantity, if greater than predetermined quantity, then take the retrieval result list of predetermined quantity；In the present embodiment, make a reservation for Quantity is 25.

Embodiment

Keyword c2 is determined in the field where noun, the Semantic Similarity between noun c1 and keyword c2 is defined For：

Wherein, Dist_{C1, c2}For the retrieval distance between noun c1 and keyword c2, the side of shortest path between the two is utilized Upper weights (word power) sum is calculated；Word power is directly related with the intensity linked between keyword, then sub- concept ci and his father's concept The intensity of c ' contacts, can be expressed as：

Preferably, it is contemplated that other factors, such as in art local density, concept depth and conceptual relation, Side right wt (ci, c ') between whole concept is expressed as：

Wherein, d (c ') represents depth of the c ' in the field where noun, the relation in fields of the E (c ') where noun Number,For the average relationship number in the field where noun, R (ci, c ') represents the conceptual relation factor, parameter alpha (α >=0) and β (0 The control field depth of≤β≤1) and density weigh the contribution calculated for whole word, and IC (c) is the deformation that calculating is linked between concept Form, i.e.,：

IC (c)=- logP (c),

Wherein, P (c) is the probability that concept c occurs in whole field.

In summary, the semantic distance between noun c1 and keyword c2 can be expressed as：

Wherein, path (c1, c2) be from noun c1 to keyword c2 by all concepts on path, LSuper (c1, C2 minimum father's concept between c1, c2) is represented；

Corresponding R (ci, c ') is distinguished according to identity relation, inheritance and relation on attributes and is defined as 1.0,0.6 and 0.3； What does not play in actual application Midst density E (c ') and depth d (c '), α and β sets 0 and 1, autgmentability language respectively In justice search, noun c1 is keyword c2 father's concept, and final semantic distance can be reduced to：

Obtain retrieving the similarity score of sentence by the semantic distance between noun c1 and keyword c2, and by the phase Retrieval weight is used as like degree scoring.

In another embodiment, the matching degree φ of retrieval weight and default weight, mould are calculated using fuzzy control method The input of paste Controlling model be retrieve weight η and default weight η ' weight difference Δ η and retrieve weight and default weight difference The poor ratio of Δ η and default weight η ' weightOutput is matching degree φ；The retrieval weight η and default weight η ' weight Poor Δ η excursion is [- 10,10], retrieves the difference Δ η and default weight η ' of weight and default the weight poor ratio of weightExcursion be [- 0.1,0.1], setting quantizing factor all be 1, therefore its domain be respectively [- 10,10] and [- 0.1, 0.1]；Matching degree φ fuzzy domain is [0,1], in order to ensure the precision of control, makes it in each mode can be well It is controlled, according to repetition test, most the poor Δ η excursions of weight are divided into seven grades at last, and weight difference Δ η fuzzy set is { NB, NM, NS, ZO, PS, PM, PB }, NB represents negative big, and NM represents negative medium, and NS represents negative small, and ZO represents that zero, PS is represented just Small, PM represents just medium, and PB represents honest；Weight difference ratioExcursion is divided into seven grades, fuzzy set for NB, NM, NS, ZO, PS, PM, PB }, NB represents negative big, and NM represents negative medium, and NS represents negative small, and ZO represents that zero, PS represents just small, and PM is represented Just medium, PB represents honest；φ points of the matching degree of output is 5 grades, is respectively { ZO, PS, PM, PB, PVB }, ZO represents zero, PS represents small, and PM represents medium, and PB represents big, and PVB represents very big；Membership function select triangular membership, such as Fig. 2, 3rd, shown in 4.

The regular selection experience that controls of fuzzy control model is：

If weight difference Δ η is negative medium, weight difference ratioTo be just medium or honest, then matching degree φ is small；Such as Fruit weight difference Δ η is honest, weight difference ratioTo be just medium or honest, then matching degree φ is very big；Specific Fuzzy Control System rule is as shown in table 1.

The fuzzy control rule of table 1

Although embodiment of the present invention is disclosed as above, it is not restricted in specification and embodiment listed With it can be applied to various suitable the field of the invention completely, can be easily for those skilled in the art Other modification is realized, therefore under the universal limited without departing substantially from claim and equivalency range, the present invention is not limited In specific details and shown here as the legend with description.

Claims

1. a kind of method that English text is retrieved based on matching degree, it is characterised in that comprise the following steps：

Step 1: prestoring retrieval information in server, every english literature associates a retrieval unit, for any one Retrieving unit includes ID, english literature entry time and at least one retrieval bar, and the retrieval bar is by the retrieval unit At least one noun and sincere verb composition in the summary of the english literature of association, and default power is carried out to all retrieval bars Weight；

Step 2: input retrieval English, split noun and sincere verb to the retrieval English, and by the noun with it is described Sincere verb is expanded into retrieval sentence；

Step 3: to it is described retrieval sentence carry out similarity evaluation obtain retrieve weight, and by it is described retrieval weight with it is described pre- If weight is matched respectively, it is ranked up according to matching degree and obtains retrieval result list.

2. the method as claimed in claim 1 that English text is retrieved based on matching degree, it is characterised in that in the step 2 In, the retrieval sentence is the logical combination of the noun and the sincere verb；Wherein, the logical combination includes：Or, And, NOT logic relation.

3. the method as claimed in claim 1 or 2 that English text is retrieved based on matching degree, it is characterised in that in the step In three, retrieval weight is obtained to the retrieval sentence progress similarity evaluation and comprised the following steps：

The field according to where the noun searches the noun, and determine the keyword in the field；

Close field density of the noun in the field, field depth, the relation with the keyword and with described Relation intensity between keyword, the word calculated between the keyword is weighed；

4. the method as claimed in claim 3 that English text is retrieved based on matching degree, it is characterised in that in the step 3 In, matched successively by the default weight size during matching.

5. the method as claimed in claim 4 that English text is retrieved based on matching degree, it is characterised in that in the step 3 In, whether the corresponding information content of retrieval result list obtained after matching is more than predetermined quantity, if greater than predetermined quantity, then Take the retrieval result list of predetermined quantity.

6. the as claimed in claim 5 method that English text is retrieved based on matching degree, it is characterised in that the predetermined quantity is 25.

7. the method that English text is retrieved based on matching degree as any one of claim 1,2,4-6, it is characterised in that In the step 3, the retrieval weight uses fuzzy control side with the matching process that the default weight is matched respectively Method is matched；

The difference Δ η and default weight η ' of weight η and default weight η ' difference Δ η, retrieval weight and default weight will be retrieved respectively RatioMatching degree φ is converted to the quantification gradation in fuzzy domain；

By the difference Δ η of the difference Δ η of the retrieval weight η and default weight η ', the retrieval weight and default weight and default power Weight η ' ratioFuzzy control model is inputted, is 7 grades by η points of the difference Δ of the retrieval weight η and default weight η ', By the retrieval weight and the difference Δ η and default weight η ' of default weight ratioIt is divided into 7 grades, φ points of matching degree is 5 Individual grade；

8. the method as claimed in claim 7 that English text is retrieved based on matching degree, it is characterised in that the retrieval weight η And default weight η ' difference Δ η domain is [- 10,10], retrieves the difference Δ η and default weight η ' of weight and default weight ratio ValueDomain be [- 0.1,0.1], setting quantizing factor is all 1, and matching degree φ domain is [0,1].

9. the method as claimed in claim 8 that English text is retrieved based on matching degree, it is characterised in that the retrieval weight η And η points of default weight η ' difference Δ is 7 grades, fuzzy set is { NB, NM, NS, 0, PS, PM, PB }, retrieval weight and default power The difference Δ η and default weight η ' of weight ratioIt is divided into 7 grades, fuzzy set is { NB, NM, NS, 0, PS, PM, PB }, general With degree, φ points are 5 grades, and fuzzy set is { 0, PS, PM, PB, PVB }；Membership function selects triangular membership.

10. the method as claimed in claim 9 that English text is retrieved based on matching degree, it is characterised in that fuzzy control model The rule is controlled to be：

If weight difference Δ η is NM, weight difference ratioFor PM or PB, then matching degree φ is S；If weight difference Δ η is PB, weight difference ratioFor PM or PB, then matching degree φ is PVB.