CN106095779A - A kind of search method based on key word position and device - Google Patents

A kind of search method based on key word position and device Download PDF

Info

Publication number
CN106095779A
CN106095779A CN201610361720.XA CN201610361720A CN106095779A CN 106095779 A CN106095779 A CN 106095779A CN 201610361720 A CN201610361720 A CN 201610361720A CN 106095779 A CN106095779 A CN 106095779A
Authority
CN
China
Prior art keywords
searching keyword
key word
webpage
distance
location sets
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201610361720.XA
Other languages
Chinese (zh)
Inventor
江永青
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Information Technology (shanghai) Co Ltd
Original Assignee
Information Technology (shanghai) Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Information Technology (shanghai) Co Ltd filed Critical Information Technology (shanghai) Co Ltd
Priority to CN201610361720.XA priority Critical patent/CN106095779A/en
Publication of CN106095779A publication Critical patent/CN106095779A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/3332Query translation
    • G06F16/3334Selection or weighting of terms from queries, including natural language queries
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/334Query execution
    • G06F16/3344Query execution using natural language analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/951Indexing; Web crawling techniques

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • Computational Linguistics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a kind of search method based on key word position and device, wherein method comprises the steps: gather webpage and analyze the key word location index of webpage, and key word location index is all key words of including of webpage and the corresponding position in webpage thereof;Receive the query term of user's input and carry out participle, obtaining searching keyword corresponding to query term, the quantity of searching keyword be N, N be the natural number more than or equal to 1;According to searching keyword distance in webpage described in described searching keyword position calculation in described key word location index, all described searching keywords beeline in described webpage is obtained, according to the relevance scores of searching keyword described in described minimum distance calculation according to described distance;Export after the relevance scores of different web pages is ranked up.Time complexity of the present invention and space complexity are low, fast response time.

Description

A kind of search method based on key word position and device
Technical field
The invention belongs to Internet technical field, in particular to a kind of search method based on key word position and Device.
Background technology
Along with the development of the Internet, the kind of search engine also gets more and more.One search engine is by searcher, index Device, searcher and user interface four part composition.Wherein, the function of searcher is that the inquiry according to user is fast in index database Speed detection document, carries out the covariance mapping of document and inquiry, is ranked up the result that will export, and realize certain user Relevance feedback mechanism.
In order to improve the service quality of search engine, search engine would generally consider that the query term that user inputs is civilian in retrieval The dependency of the position in Dang.This is because when user search effect is analyzed, usually can find between key word Position and distance are very big on result impact, and the key word that is analyzed and acquired by the query term of user's input is in a document When near or co-occurrence the density of location comparison is bigger, the accuracy of the result of retrieval will promote.Such as " Beijing ", " roast duck " two key words, the position that the two key word occurs at two documents is respectively as follows:
Document 1:XXXXXXXXX Beijing XX roast duck XXXXXXXXXXX;Document 2:XX Beijing XXXXXXXXXXXXXXXXX bakes Duck XX.By comparing discovery, from the point of view of described key word distance, document 1 is higher than document 2 dependency.
Present search engine common practice is that the query term to user's input carries out resolving the corresponding key word of acquisition, root Dependency according to key word position calculation position in a document.Mainly include two ways: 1, plain text matching process is (the completeest Complete chain mates);2, analyze document and obtain and record described key word at whole word position in whole document, then root Dependency according to the seat of key word described in the position calculation of record.But, owing to the quantity of search file is very big, this results in The defect that retrieval rate is the slowest.Meanwhile, if too much text in a file, the slow-footed defect of correlation calculations is also led to. Both modes all can cause time complexity and space complexity in position correlation technology the lowest.
Summary of the invention
For solve the existing query term according to user's input calculate document causes during position correlation time Between complexity and the too high technological deficiency of space complexity, the present invention by setting up the inverted index table of all documents, root Inquire about key word corresponding to described query term distance in inverted index table according to described inverted index table, calculate described key word The optimum combination of position, according to the relevance scores of key word described in the position calculation of optimum combination, thus improve retrieval effect Rate.
The invention provides a kind of search method based on key word position, comprise the steps:
Gathering webpage and analyze the key word location index of described webpage, described key word location index is in described webpage Including all key words and corresponding position in webpage;
Receive user input query term and carry out participle, obtain the searching keyword that described query term is corresponding, described in look into Ask key word quantity be N, N be the natural number more than or equal to 1;
According to searching keyword described in described searching keyword position calculation in described key word location index at net Distance in Ye, obtains all described searching keywords beeline in described webpage, according to described according to described distance The relevance scores of searching keyword described in minimum distance calculation;
Export after the relevance scores of different web pages is ranked up.
Further, inquire about described in the described position calculation according to described searching keyword in described key word location index Key word distance in webpage, obtains all described searching keywords short distance in described webpage according to described distance From, include according to the relevance scores of searching keyword described in described minimum distance calculation
Judge that whether quantity N of described searching keyword is more than 1;
If N is more than 1, then first described searching keyword is utilized to enter with all key words included in described webpage Row coupling, calculates the distance of first described searching keyword and all key words included in this webpage;
Mating next described searching keyword with all key words in described webpage, calculate next institute State the distance of each described key word in webpage described in the distance of searching keyword, and this is calculated acquisition distance value A and meter Count described searching keyword distance B in same position in compare, if A < B, then by the value of this position It is set to B, does not processes;
Until last described searching keyword, obtain last described query term key word in described webpage The distance of each key word position, obtains the minimum distance of all described distance intermediate values and determines last described inquiry key Word optimum position in described webpage, inquires about the shortest in described webpage of all searching keywords according to described optimum position Distance;
The relevance scores in described webpage according to searching keyword described in described minimum distance calculation.
Further, if described N is more than 1, then utilize first described searching keyword and each position in described location sets The key word putting correspondence mates, and calculates first described searching keyword distance in this location sets at each element position Including
Whether the match is successful to judge described searching keyword and first key word in described webpage;
If described searching keyword and first Keywords matching success in described webpage, it is determined that described searching keyword In described webpage, the distance at first key word is 1;
Judge that described searching keyword is the most successful with the next Keywords matching in described webpage;
If the match is successful, it is determined that this described searching keyword distance value distance of this position in described webpage is 1, Otherwise determining that this described searching keyword distance value in the web page is (M-N+1), wherein, M is this described searching keyword The value of this position in the web page, N is the position of this described searching keyword described key word that the match is successful in the web page Value.
Further, described judge described searching keyword and first key word in described webpage whether the match is successful also to include
If described searching keyword is unsuccessful with first Keywords matching in described webpage, it is determined that described inquiry key Word distance at first key word in described webpage is infinitely great.
Further, whether described quantity N judging described searching keyword also includes more than 1
If N=1, that adds up searching keyword described in described webpage hits rate, calculates described according to described rate of hitting The relevance scores of query webpage.
Present invention also offers a kind of retrieval device based on key word position, including
Index module, for gathering webpage and analyzing the key word location index of described webpage, described key word position rope It is cited as all key words that described webpage includes and the corresponding position in webpage thereof;
Word-dividing mode, for receiving the query term of user's input and carrying out participle, obtains the inquiry that described query term is corresponding Key word, the quantity of described searching keyword be N, N be the natural number more than or equal to 1;
Computing module, for calculating described searching keyword distance in webpage according to described key word location index, All described searching keywords beeline in described webpage is obtained, according to described minimum distance calculation according to described distance The relevance scores of described searching keyword;
Input module, exports after the relevance scores of different web pages being ranked up.
Further, described computing module includes
First judges submodule, for judging that whether quantity N of described searching keyword is more than 1;
First calculating sub module, if for N more than 1, then utilizes in first described searching keyword and described webpage The all key words included mate, calculate all keys included in first described searching keyword and this webpage The distance of word;
Second calculating sub module, for entering next described searching keyword with all key words in described webpage Row coupling, calculates the distance of each described key word in webpage described in the distance of next described searching keyword, and by this Calculate acquisition distance value A to compare with calculating upper described searching keyword distance B in same position, if A < B, then be set to B by the value of this position, do not process;
Analyze submodule, for until last described searching keyword, obtaining last described query term crucial Word is the distance of each key word position in described webpage, and the distance obtaining all described distance intermediate values minimum determines last Individual described searching keyword optimum position in described webpage, inquires about all searching keywords in institute according to described optimum position State the beeline in webpage;
Statistics submodule, for being correlated with in described webpage according to searching keyword described in described minimum distance calculation Property mark.
Further, the first calculating sub module also includes
Second judging unit, is used for judging whether described searching keyword mates into first key word in described webpage Merit;
Second computing unit, if first Keywords matching success in described searching keyword with described webpage, then Determine that described searching keyword distance in described webpage at first key word is 1;
3rd judging unit, for judging next Keywords matching in described searching keyword and described webpage whether Success;
3rd computing unit, if for the match is successful, it is determined that this described searching keyword distance value is in described webpage The distance of this position is 1, otherwise determines that this described searching keyword distance value in the web page is (M-N+1), wherein, M For the value of this described searching keyword this position in the web page, N is that the match is successful in the web page for this described searching keyword The value of position of described key word.
Further, the first calculating sub module also includes
4th computing unit, if unsuccessful with first Keywords matching in described webpage for described searching keyword, Then determine that described searching keyword distance in described webpage at first key word is infinity.
Further, described computing module also includes
Hit rate calculating sub module, if for N=1, adding up the rate of hitting of searching keyword described in described webpage, root The relevance scores of described query webpage is calculated according to described rate of hitting.
To sum up, beneficial effects of the present invention is as follows:
1, the method using optimal location combination, finds out in document and can meet the text filed of query word combination, carry Accuracy and the degree of association of text relevant are risen.
2, the method using dynamic programming, is greatly reduced the algorithm complex searching optimal location combination, solves The demand of quick-searching in magnanimity document.
3, after finding optimal location combination, it is possible to calculated the dependency of query word by different algorithms.
Accompanying drawing explanation
Fig. 1 is the schematic flow sheet of search method based on key word position of the present invention;
Fig. 2 is key word location index one enforcement described in search method based on key word position of the present invention The schematic diagram of example;
Fig. 3 is to calculate described searching keyword distance one in search method based on key word position of the present invention The schematic flow sheet of embodiment;
Fig. 4 is to calculate described searching keyword in institute's rheme in search method based on key word position of the present invention Put the schematic flow sheet of the embodiment gathering the distance at each element position;
Fig. 5 is the structural representation of retrieval one embodiment of device based on key word position of the present invention.
Detailed description of the invention
Below by specific embodiment and combine accompanying drawing the present invention is described in further detail.
In order to solve the problems referred to above, the invention provides a kind of search method based on key word position.As it is shown in figure 1, Described method comprises the steps:
S101, gathering webpage and analyze the key word location index of described webpage, described key word location index is described All key words that webpage includes and the corresponding position in webpage thereof.
The present invention, by being scanned by all webpages gathered, obtains the key word that webpage includes.Different keys Word includes different keyword identification symbols, and therefore, when being embodied as, first the webpage gathered is analyzed by the present invention, extracts Key word that each webpage includes also records its position in webpage, sets up according to key word in each webpage and position thereof Described key word location index.As in figure 2 it is shown, the identifier that docid is each webpage, wordid is the pass that this webpage includes The identifier of keyword, hit is each key word position in the web page.The key word quantity that each webpage includes is different, Each key word position in described webpage is different.Therefore first webpage is analyzed by the present invention, obtains all webpages Identifier, adds up the keyword identification that each web page identifier includes and accords with and record each key word position in webpage.
S102, the query term of reception user's input also carry out participle, obtain the searching keyword that described query term is corresponding, institute State the quantity of searching keyword be N, N be the natural number more than or equal to 1.
When being embodied as, the acquisition mode of searching keyword is that the query term to user's input carries out segmentation methods acquisition 's.Such as " good-looking film ", can be divided into " good-looking ", " ", " film " in segmentation methods, wherein " " word is because often Occur, can be removed by giving as " stop words ".So the searching keyword result that last described query term participle obtains is " good See ", " film ".
S103, according to searching keyword described in described searching keyword position calculation in described key word location index Distance in webpage, obtains all described searching keywords beeline in described webpage according to described distance, according to The relevance scores of searching keyword described in described minimum distance calculation.
In the present invention, the quantity of searching keyword that obtains according to the query term participle of user's input is the most uncertain.Specifically During enforcement, the most at least include two searching keywords.The present invention is by calculating described searching keyword in described crucial lexeme Put the distance in index and obtain described key word relevance scores in query webpage.Quantity due to described searching keyword And uncertain, each described searching keyword position in webpage may be different, thus cause this webpage to close with described inquiry The dependency of keyword is little.Such as in searching keyword " Beijing " and " roast duck ", document 1 and document 2 the two searching keyword Position in a document is as follows:
Document 1:XXXXXXXXX Beijing XX roast duck XXXXXXXXXXX;
Document 2:XX Beijing XXXXXXXXXXXXXXXXX roast duck XX.
By comparing discovery, from the point of view of described searching keyword position in a document, document 1 is than document 2 dependency Higher.Therefore, when there is multiple described searching keyword, need to calculate described searching keyword low coverage in the web page From, thus farthest calculate the dependency of described searching keyword and described webpage.
Compared with the technology that traditional distance according to searching keyword in webpage calculates relevance scores, the present invention Not use plain text (i.e. character mates completely) this time complexity and the biggest matching algorithm of space complexity.This The most all of bright webpage is analyzed processing and obtains described key word location index table, then the query term that will input according to user The searching keyword that participle obtains mates with described key word location index table, calculates described searching keyword in described pass The distance of the position of each key word in keyword location index table, thus effectively reduce time complexity and the sky of retrieving Between complexity.
Such as, it is thus achieved that behind an optimum position, optionally by described searching keyword position correlation mark, meter Calculation formula is:
Wherein, smoothA and smoothB is default smoothing parameter, and words_count is described key word location index The quantity of middle key word, span is the distance of key word distribution score, and promote parameter is for excavating the difference of span mark change DRS degree.
S104, the relevance scores of different web pages is ranked up after export.
As it is shown on figure 3, S103 specifically includes following steps:
S1031, according to described key word location index obtain all described searching keywords position in described webpage Set;
S1032, judge that quantity N of described searching keyword is whether more than 1;
If S1033 N is more than 1, then utilize first described searching keyword corresponding with each position in described location sets Key word mate, calculate first described searching keyword distance in this location sets at each element position;
S1034, the key word corresponding with each position in described location sets of next described searching keyword is entered Row coupling, calculates next described searching keyword distance in this location sets at each element position, and this is calculated Obtain distance A and compare with calculating upper described searching keyword distance B in same position, if A < B, then will The value of this position is set to B, does not processes.
S1035, until last described searching keyword, obtains last described query term key word in this position Distance at each element position in set, obtains the minimum distance of all described distance intermediate values and determines that last described inquiry is closed Keyword optimum position in described webpage, inquires about all searching keywords in described webpage according to described optimum position Short distance;
S1036, the relevance scores in described webpage according to searching keyword described in described minimum distance calculation.
Further, judge described in S103 whether quantity N of described searching keyword also includes more than 1
If S1037 is N=1, that adds up searching keyword described in described location sets hits rate, hits according to described Rate calculates the relevance scores of described query webpage.
As shown in Figure 4, if N is more than 1 in S1033, then utilize in first described searching keyword and described location sets Key word corresponding to each position mates, and calculates first described searching keyword in this location sets at each element position Distance comprise the steps:
S201, judge whether the key word that described searching keyword is corresponding with first element in described location sets mates into Merit;
If the Keywords matching success that the described searching keyword of S202 is corresponding with first element in described location sets, Then determine that described searching keyword distance in described webpage at first key word is 1;
S203, judge that the Keywords matching that described searching keyword is corresponding with the next element in described location sets is No success;
If the match is successful for S204, it is determined that this described searching keyword distance value this element position in described location sets The distance at the place of putting is 1, otherwise determines that this described searching keyword distance value in this location sets is (M-N+1), wherein, M For this described searching keyword value at this element position in this location sets, N is that this described searching keyword is in this position The value of a upper described searching keyword position in set.
Further, described S1032 also includes
If the Keywords matching that the described searching keyword of S205 is corresponding with first element in described location sets does not becomes Merit, it is determined that described searching keyword distance at first element position in described location sets is infinitely great.
Present invention also offers a kind of retrieval device based on key word position.As it is shown in figure 5, described device includes
Index module 10, for gathering webpage and analyzing the key word location index of described webpage, described key word position All key words that index includes for described webpage and the corresponding position in webpage thereof;
Word-dividing mode 20, for receiving the query term of user's input and carrying out participle, obtains corresponding the looking into of described query term Ask key word, the quantity of described searching keyword be N, N be the natural number more than or equal to 1;
Computing module 30, for according to described in described searching keyword position calculation in described key word location index Searching keyword distance in webpage, obtains the shortest in described webpage of all described searching keywords according to described distance Distance, according to the relevance scores of searching keyword described in described minimum distance calculation;
Input module 40, exports after the relevance scores of different web pages being ranked up.
Further, described computing module includes
Position submodule, for obtaining all described searching keywords at described webpage according to described key word location index In location sets;
First judges submodule, for judging that whether quantity N of described searching keyword is more than 1;
First calculating sub module, if for N more than 1, then utilizes first described searching keyword and described location sets In key word corresponding to each position mate, calculate first described searching keyword each element position in this location sets The distance at place;
Second calculating sub module, for by next described searching keyword with each position in described location sets Corresponding key word mates, calculate next described searching keyword in this location sets at each element position away from From, and this calculating acquisition distance A is compared with calculating upper described searching keyword distance B in same position Relatively, if A < B, then the value of this position is set to B, does not processes;
Analyze submodule, for until last described searching keyword, obtaining last described query term crucial Word distance at each element position in this location sets, the distance obtaining all described distance intermediate values minimum determines last Described searching keyword optimum position in described webpage, inquires about all searching keywords described according to described optimum position Beeline in webpage;
Statistics submodule, for being correlated with in described webpage according to searching keyword described in described minimum distance calculation Property mark.
Further, the first calculating sub module also includes
4th computing unit, if being used for the key that described searching keyword is corresponding with first element in described location sets Word coupling is unsuccessful, it is determined that described searching keyword distance at first element position in described location sets is infinite Greatly.
Further, the first calculating sub module also includes
Second judging unit, for judging the key that described searching keyword is corresponding with first element in described location sets Whether the match is successful for word;
Second computing unit, if being used for the key that described searching keyword is corresponding with first element in described location sets The match is successful for word, it is determined that described searching keyword distance at first key word in described webpage is 1;
3rd judging unit, for judging that described searching keyword is corresponding with the next element in described location sets Keywords matching is the most successful;
3rd computing unit, if for the match is successful, it is determined that this described searching keyword distance value is at described position collection In conjunction, the distance at this element position is 1, otherwise determines that this described searching keyword distance value in this location sets is (M- N+1), wherein, M is this described searching keyword value in this location sets at this element position, and N is this described inquiry key Word in this location sets on the value of a described searching keyword position.
Described computing module also includes
Hit rate calculating sub module, if for N=1, adding up hitting of searching keyword described in described location sets Rate, calculates the relevance scores of described query webpage according to described rate of hitting.
The present invention uses the method for dynamic programming can greatly reduce the algorithm complex searching optimal distance combination.Below The present invention is described as a example by webpage A.Assume that the content that webpage A includes is d a d b a c d d a c.The inquiry of user's input The key word that item participle obtains is respectively a, b, c.
In webpage A, not full content all includes a, b and c, and therefore first described webpage is analyzed by the present invention, obtains This webpage includes the position of a, b and c.
As follows:
As can be observed from the foregoing, described a, b and c location sets in described webpage is (2 4569 10), specifically real Shi Shi, can store the position of a, b and c in described webpage by data.
The word of corresponding hit is respectively:
First the present invention records described searching keyword position in a document, thus by described searching keyword described Location sets carries out successive ignition.Such as, first described location sets is traveled through with described searching keyword a;Second time calculates a b Minimum range in a document, such as ab, the concrete b that directly travels through when calculating, in lock position set, finally travels through c, i.e. completes The combined retrieval of a, b and c.Traversal positions location sets is represented with j.
Assuming that min_span is an one-dimension array, array size is the position that searching keyword occurs in webpage altogether Putting, the number of the element in this array is the number of times (being 6) that described searching keyword occurs in webpage in this example.min_ Optimum combination before what span [j] represented be jth position is to the distance (length comprising this optimum combination of jth position Degree), if document " a b c d " is when query word is " a b ", min_span [1]=2 (j points to b), min_span [3]=4 (j Point to d).In each iteration taken turns, if min_span [j] value gone out in certain position position calculation than last round of repeatedly Generation big, then replace last round of min_span [j] value.
Specifically it is calculated as follows:
Initialize: (X represents infinitely great)
I. first round iteration:
When searching keyword is a:
(the most nearest optimum combination is that in document, position is the term a of 2, is 3 apart from current positions [j]=4 Distance)
(the most nearest optimum combination is that in document, position is the term a of 5, is 1 apart from current positions [j]=5 Individual distance)
Iteration is to terminating ... ..
Ii. second iteration is taken turns:
Adding term b, current queries word is a b
(because current word is a, not running into the term b of just addition, therefore min_occu is X;X is bigger than 1 before, will It replaces with X)
(positions be 4 word hit b, min_occu and changed 4 into.For b, the min_span of this position is 1, But for a, the min_span of this position is 3, therefore for a b query word, it should take the maximum of the two value 3)
Iteration is to terminating ...
Iii. third round iteration:
Adding term c, current queries word is a b c
Iteration is to terminating ....
The most last result
Traversal min_span, chooses minima 3 therein, is the length of optimal location combination.Choose this position The text of 3 distances is exactly the combination of optimal distance forward, the combination of the searching keyword that underscore is corresponding.
Such as, the optimal location that is combined as of the underscore of above-mentioned acquisition combines, then can first calculate span_score, false Such as words_count=3, span=3, by arranging smoothA and smoothB, and promote, one can be calculated Span_score=exp ((words_count+smoothA)/(span+smoothB), promote).Calculate relevance scores After, utilize the method for the present invention to obtain the combination of above-mentioned optimal location, i.e. backward b, a, c, then calculate backward b, the dependency of a, c Mark, calculates the most according to the following equation:
Reverse_score=exp (1 (reverse_count+smoothA)/max_reverse_count+ Smooth), promote), same smoothA and smoothB is the smoothing processing making mark, and promote state modulator divides The difference degree of number.
The foregoing is only the preferred embodiments of the present invention, be not limited to the present invention, for the skill of this area For art personnel, the present invention can have various modifications and variations.All within the spirit and principles in the present invention, that is made any repaiies Change, equivalent, improvement etc., should be included within the scope of the present invention.

Claims (10)

1. a search method based on key word position, it is characterised in that comprise the steps:
Gathering webpage and analyze the key word location index of described webpage, described key word location index is that described webpage includes All key words and corresponding position in webpage;
Receiving the query term of user's input and carry out participle, obtaining the searching keyword that described query term is corresponding, described inquiry is closed The quantity of keyword be N, N be the natural number more than or equal to 1;
According to searching keyword described in described searching keyword position calculation in described key word location index in webpage Distance, according to described distance obtain all described searching keywords beeline in described webpage, according to described the shortest Distance calculates the relevance scores of described searching keyword;
Export after the relevance scores of different web pages is ranked up.
Search method based on key word position the most according to claim 1, it is characterised in that described according to described inquiry Searching keyword distance in webpage described in key word position calculation in described key word location index, according to described away from From obtaining all described searching keywords beeline in described webpage, close according to inquiry described in described minimum distance calculation The relevance scores of keyword includes
All described searching keywords location sets in described webpage is obtained according to described key word location index;
Judge that whether quantity N of described searching keyword is more than 1;
If N is more than 1, then the key word utilizing first described searching keyword corresponding with each position in described location sets enters Row coupling, calculates first described searching keyword distance in this location sets at each element position;
The key word corresponding with each position in described location sets of next described searching keyword is mated, calculates Next described searching keyword distance at each element position in this location sets, and this is calculated obtain distance A and Calculate described searching keyword distance B in same position to compare, if A < B, then by this position Value is set to B, does not processes;
Until last described searching keyword, obtain last described query term key word each unit in this location sets The distance of element position, the distance obtaining all described distance intermediate values minimum determines that last described searching keyword is described Optimum position in webpage, inquires about all searching keywords beeline in described webpage according to described optimum position;
The relevance scores in described webpage according to searching keyword described in described minimum distance calculation.
Search method based on key word position the most according to claim 2, it is characterised in that if described N is more than 1, The key word then utilizing first described searching keyword corresponding with each position in described location sets mates, and calculates first Described searching keyword distance at each element position in this location sets includes
Whether the match is successful to judge the key word that described searching keyword is corresponding with first element in described location sets;
If the Keywords matching success that described searching keyword is corresponding with first element in described location sets, it is determined that described Searching keyword distance at first key word in described webpage is 1;
Judge that the Keywords matching that described searching keyword is corresponding with the next element in described location sets is the most successful;
If the match is successful, it is determined that this described searching keyword distance value distance at this element position in described location sets Be 1, otherwise determine this described searching keyword distance value in this location sets for (M-N+1), wherein, M is that this described is looked into Ask key word value at this element position in this location sets, N be this described searching keyword in this location sets on one The value of individual described searching keyword position.
Search method based on key word position the most according to claim 3, it is characterised in that the described inquiry of described judgement Whether the match is successful also includes for key word and first key word in described webpage
If the Keywords matching that described searching keyword is corresponding with first element in described location sets is unsuccessful, it is determined that institute Stating searching keyword distance in described location sets at first element position is infinity.
Search method based on key word position the most according to claim 2, it is characterised in that the described inquiry of described judgement Whether quantity N of key word also includes more than 1
If N=1, that adds up searching keyword described in described location sets hits rate, calculates described according to described rate of hitting The relevance scores of query webpage.
6. a retrieval device based on key word position, it is characterised in that include
Index module, for gathering webpage and analyzing the key word location index of described webpage, described key word location index is All key words that described webpage includes and the corresponding position in webpage thereof;
Word-dividing mode, for receiving the query term of user's input and carrying out participle, obtains inquiry corresponding to described query term crucial Word, the quantity of described searching keyword be N, N be the natural number more than or equal to 1;
Computing module, for closing according to inquiry described in described searching keyword position calculation in described key word location index Keyword distance in webpage, obtains all described searching keywords beeline in described webpage according to described distance, Relevance scores according to searching keyword described in described minimum distance calculation;
Input module, exports after the relevance scores of different web pages being ranked up.
Retrieval device the most according to claim 1, it is characterised in that described computing module includes
Position submodule, for obtaining all described searching keywords in described webpage according to described key word location index Location sets;
First judges submodule, for judging that whether quantity N of described searching keyword is more than 1;
First calculating sub module, if for N more than 1, then utilizes first described searching keyword every with described location sets Key word corresponding to individual position mates, and calculates first described searching keyword in this location sets at each element position Distance;
Second calculating sub module, corresponding with each position in described location sets for by next described searching keyword Key word mates, and calculates next described searching keyword distance in this location sets at each element position, and will This calculates acquisition distance A and compares with calculating upper described searching keyword distance B in same position, if A < B, then be set to B by the value of this position, do not process;
Analyze submodule, for until last described searching keyword, obtain last described query term key word and exist Distance at each element position in this location sets, the distance obtaining all described distance intermediate values minimum determines described in last Searching keyword optimum position in described webpage, inquires about all searching keywords at described webpage according to described optimum position In beeline;
Statistics submodule, for dividing according to the dependency in described webpage of searching keyword described in described minimum distance calculation Number.
Retrieval device the most according to claim 1, it is characterised in that the first calculating sub module also includes
Second judging unit, for judging that the key word that described searching keyword is corresponding with first element in described location sets is It is no that the match is successful;
Second computing unit, if being used for the key word that described searching keyword is corresponding with first element in described location sets It is made into merit, it is determined that described searching keyword distance at first key word in described webpage is 1;
3rd judging unit, for judging the key that described searching keyword is corresponding with the next element in described location sets Word coupling is the most successful;
3rd computing unit, if for the match is successful, it is determined that this described searching keyword distance value is in described location sets Distance at this element position is 1, otherwise determines that this described searching keyword distance value in this location sets is (M-N+ 1), wherein, M is this described searching keyword value in this location sets at this element position, and N is this described searching keyword The value of a described searching keyword position in this location sets.
Retrieval device the most according to claim 1, it is characterised in that the first calculating sub module also includes
4th computing unit, if being used for the key word that described searching keyword is corresponding with first element in described location sets Join unsuccessful, it is determined that described searching keyword distance at first element position in described location sets is infinitely great.
Retrieval device the most according to claim 1, it is characterised in that described computing module also includes
Hit rate calculating sub module, if for N=1, adding up the rate of hitting of searching keyword described in described location sets, root The relevance scores of described query webpage is calculated according to described rate of hitting.
CN201610361720.XA 2016-05-26 2016-05-26 A kind of search method based on key word position and device Pending CN106095779A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201610361720.XA CN106095779A (en) 2016-05-26 2016-05-26 A kind of search method based on key word position and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201610361720.XA CN106095779A (en) 2016-05-26 2016-05-26 A kind of search method based on key word position and device

Publications (1)

Publication Number Publication Date
CN106095779A true CN106095779A (en) 2016-11-09

Family

ID=57230785

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201610361720.XA Pending CN106095779A (en) 2016-05-26 2016-05-26 A kind of search method based on key word position and device

Country Status (1)

Country Link
CN (1) CN106095779A (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107273254A (en) * 2017-06-16 2017-10-20 郑州云海信息技术有限公司 A kind of system and method that text is screened under windows
CN109871468A (en) * 2019-02-01 2019-06-11 国网四川省电力公司广元供电公司 Non-structured document management and rules and regulations entry management integration system
CN113220965A (en) * 2021-04-14 2021-08-06 武汉祺锦信息技术有限公司 Website keyword intelligent grabbing and classifying analysis system

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1306258A (en) * 2001-03-09 2001-08-01 北京大学 Method for judging position correlation of a group of query keys or words on network page
CN101923556A (en) * 2010-02-09 2010-12-22 上海莱希信息科技有限公司 Method and device for searching webpages according to sentence serial numbers

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1306258A (en) * 2001-03-09 2001-08-01 北京大学 Method for judging position correlation of a group of query keys or words on network page
CN101923556A (en) * 2010-02-09 2010-12-22 上海莱希信息科技有限公司 Method and device for searching webpages according to sentence serial numbers

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
刘兵: "《Web数据挖掘》", 31 January 2013, 清华大学出版社 *
多恩等: "《数据集成原理》", 31 July 2014 *

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107273254A (en) * 2017-06-16 2017-10-20 郑州云海信息技术有限公司 A kind of system and method that text is screened under windows
CN107273254B (en) * 2017-06-16 2021-03-12 苏州浪潮智能科技有限公司 System and method for screening text under windows
CN109871468A (en) * 2019-02-01 2019-06-11 国网四川省电力公司广元供电公司 Non-structured document management and rules and regulations entry management integration system
CN113220965A (en) * 2021-04-14 2021-08-06 武汉祺锦信息技术有限公司 Website keyword intelligent grabbing and classifying analysis system

Similar Documents

Publication Publication Date Title
Fang et al. Joint entity linking with deep reinforcement learning
CN107220295B (en) Searching and mediating strategy recommendation method for human-human contradiction mediating case
CN103544255B (en) Text semantic relativity based network public opinion information analysis method
US7409404B2 (en) Creating taxonomies and training data for document categorization
CN108846029B (en) Information correlation analysis method based on knowledge graph
CN103778227B (en) The method screening useful image from retrieval image
CN105045875B (en) Personalized search and device
CN111104511B (en) Method, device and storage medium for extracting hot topics
CN110888991B (en) Sectional type semantic annotation method under weak annotation environment
Gopalakrishnan et al. Matching product titles using web-based enrichment
CN103473283A (en) Method for matching textual cases
CN108132927A (en) A kind of fusion graph structure and the associated keyword extracting method of node
Gao et al. Query expansion using path-constrained random walks
CN108763348A (en) A kind of classification improved method of extension short text word feature vector
CN112307182B (en) Question-answering system-based pseudo-correlation feedback extended query method
CN108614897A (en) A kind of contents diversification searching method towards natural language
CN106095779A (en) A kind of search method based on key word position and device
CN103064846B (en) Retrieval device and search method
Alobaid et al. Typology-based semantic labeling of numeric tabular data
CN107169020A (en) A kind of orientation web retrieval method based on keyword
Campelo et al. A model for geographic knowledge extraction on web documents
Li et al. Deep learning and semantic concept spaceare used in query expansion
CN102915311A (en) Searching method and searching system
CN110362813A (en) Relevance of searches measure, storage medium, equipment and system based on BM25
Li et al. Towards better entity linking

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20161109

RJ01 Rejection of invention patent application after publication